**Alessandra Russo Andy Schürr (Eds.)**

# **Fundamental Approaches to Software Engineering**

**21st International Conference, FASE 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018, Proceedings**

# Lecture Notes in Computer Science 10802

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

## Editorial Board

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA Gerhard Weikum, Germany

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, City University of Hong Kong Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Fundamental Approaches to Software Engineering

21st International Conference, FASE 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018 Proceedings

Editors Alessandra Russo Imperial College London London UK

Andy Schürr TU Darmstadt Darmstadt Germany

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-89362-4 ISBN 978-3-319-89363-1 (eBook) https://doi.org/10.1007/978-3-319-89363-1

Library of Congress Control Number: 2018937400

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am happy to announce that this is the first ETAPS with gold open access proceedings. This means that all papers are accessible by anyone for free.

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee. The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security. Organizing these conferences in a coherent, highly synchronized conference program facilitates participation in an exciting event, offering attendees the possibility to meet many researchers working in different directions in the field, and to easily attend talks of different conferences. Before and after the main conference, numerous satellite workshops take place and attract many researchers from all over the globe.

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, yielding an overall acceptance rate of 30%. I thank all the authors for their interest in ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and (ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on hardware verification. My sincere thanks to all these speakers for their inspiring and interesting talks!

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the Department of Informatics of the Aristotle University of Thessaloniki. The university was founded in 1925 and currently has around 75,000 students; it is the largest university in Greece. ETAPS 2018 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros Stratis (EasyConferences).

The overall planning for ETAPS is the main responsibility of the Steering Committee, and in particular of its Executive Board. The ETAPS Steering Committee consists of an Executive Board and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman (Twente), Panagiotis Katsaros (Thessaloniki), Ralf Küsters (Stuttgart), Ugo Dal Lago (Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), Don Sannella (Edinburgh), Andy Schürr (Darmstadt), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendees, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local organization team for all their enormous efforts that led to a fantastic ETAPS in Thessaloniki!

February 2018 Joost-Pieter Katoen

## Preface

This book contains the proceedings of FASE 2018, the 21th International Conference on Fundamental Approaches to Software Engineering, held in Thessaloniki, Greece, in April 2018, as part of the annual European Joint Conferences on Theory and Practice of Software (ETAPS 2018).

As usual for FASE, the contributions combine the development of conceptual and methodological advances with their formal foundations, tool support, and evaluation on realistic or pragmatic cases. As a result, the volume contains regular research papers that cover a wide range of topics, such as program and system analysis, model transformations, configuration and synthesis, graph modeling and transformation, software product lines, test selection, as well as learning and inference. We hope that the community will find this volume engaging and worth reading.

The contributions included have been carefully selected. For the third time, FASE used a double-blind review process, as the past two years' experiments were considered valuable by authors and worth the additional effort of anonymizing the papers. We received 77 abstract submissions from 24 different countries, from which 63 full-paper submissions materialized. All papers were reviewed by three experts in the field, and after intense discussion, only 19 were accepted, giving an acceptance rate of 30%.

We thank the ETAPS 2018 general chair Katsaros Panagiotis, the ETAPS organizers, Ioannis Stamelos, Lefteris Angelis, and George Rahonis, the ETAPS publicity chairs, Ezio Bartocci and Simon Bliudze, as well as the ETAPS SC chair, Joost-Pieter Katoen, for their support during the whole process. We thank all the authors for their hard work and willingness to contribute. Last but not least, we thank all the Program Committee members and external reviewers, who invested time and effort in the selection process to ensure the scientific quality of the program.

February 2018 Alessandra Russo Andy Schürr

## Organization

#### Program Committee

Ruth Breu Universität Innsbruck, Austria Yuanfang Cai Drexel University, USA Sagar Chaki Carnegie Mellon University, USA Hana Chockler King's College London, UK Ewen Denney NASA Ames, USA Stefania Gnesi ISTI-CNR, Italy Dilian Gurov Royal Institute of Technology (KTH), Sweden Zhenjiang Hu National Institute for Informatics, Japan Reiner Hähnle Darmstadt University of Technology, Germany Valerie Issarny Inria, France Einar Broch Johnsen University of Oslo, Norway Gerti Kappel Vienna University of Technology, Austria Ekkart Kindler Technical University of Denmark, Denmark Kim Mens Université catholique de Louvain, Belgium Fernando Orejas Universitat Politècnica de Catalunya, Spain Fabrizio Pastore University of Luxembourg, Luxembourg Arend Rensink Universiteit Twente, The Netherlands Leila Ribeiro Universidade Federal do Rio Grande do Sul, Brazil Julia Rubin The University of British Columbia, USA Bernhard Rumpe RWTH Aachen, Germany Alessandra Russo Imperial College London, UK Rick Salay University of Toronto, Canada Ina Schaefer Technische Universität Braunschweig, Germany Andy Schürr Darmstadt University of Technology, Germany Marjan Sirjani Reykjavik University, Iceland Wil Van der Aalst RWTH Aachen, Germany Daniel Varro Budapest University of Technology and Economics, Hungary Virginie Wiels ONERA/DTIM, France Yingfei Xiong Peking University, China Didar Zowghi University of Technology Sydney, Australia

## Additional Reviewers

Adam, Kai Ahmed, Khaled E. Alrajeh, Dalal Auer, Florian Basile, Davide Bergmann, Gábor Bill, Robert Bubel, Richard Búr, Márton Chen, Yifan Cicchetti, Antonio de Vink, Erik Dulay, Naranker Feng, Qiong Guimaraes, Everton Haeusler, Martin Haglund, Jonas Haubrich, Olga Herda, Mihai Hillemacher, Steffen Huber, Michael Jafari, Ali Jiang, Jiajun Johansen, Christian Joosten, Sebastiaan Kamburjan, Eduard Kautz, Oliver Khamespanah, Ehsan Knüppel, Alexander Laurent, Nicolas Leblebici, Erhan Liang, Jingjing Lindner, Andreas Lity, Sascha

Lochau, Malte Markthaler, Matthias Mauro, Jacopo Melgratti, Hernan Micskei, Zoltan Mohaqeqi, Morteza Mousavi, Mohamad Nesic, Damir Nieke, Michael Pun, Ka I. Saake, Gunter Sauerwein, Clemens Schlatte, Rudolf Schuster, Sven Seidl, Martina Semeráth, Oszkár Shaver, Chris Shumeiko, Igor Steffen, Martin Steinebach, Martin Steinhöfel, Dominic Stolz, Volker Tapia Tarifa, Silvia Lizeth Ter Beek, Maurice H. Tiezzi, Francesco Varga, Simon Wally, Bernhard Wang, Bo Weckesser, Markus Whiteside, Iain Wimmer, Manuel Wolny, Sabine Xiao, Lu Yue, Ruru

## Contents

#### Model-Based Software Development

Diego Marmsoler



#### Specification and Program Testing


#### Family-Based Software Development


# Model-Based Software Development

# A Formal Framework for Incremental Model Slicing

Gabriele Taentzer<sup>1</sup> , Timo Kehrer<sup>2</sup> , Christopher Pietsch3(B) , and Udo Kelter<sup>3</sup>

<sup>1</sup> Philipps-Universität Marburg, Marburg, Germany

<sup>2</sup> Humboldt-Universität zu Berlin, Berlin, Germany <sup>3</sup> University of Siegen, Siegen, Germany cpietsch@informatik.uni-siegen.de

Abstract. Program slicing is a technique which can determine the simplest program possible that maintains the meaning of the original program w.r.t. a slicing criterion. The concept of slicing has been transferred to models, in particular to statecharts. In addition to the classical use cases of slicing adopted from the field of program understanding, model slicing is also motivated by specifying submodels of interest to be further processed more efficiently, thus dealing with scalability issues when working with very large models. Slices are often updated throughout specific software development tasks. Such a slice update can be performed by creating the new slice from scratch or by incrementally updating the existing slice. In this paper, we present a formal framework for defining model slicers that support incremental slice updates. This framework abstracts from the behavior of concrete slicers as well as from the concrete model modification approach. It forms a guideline for defining incremental model slicers independent of the underlying slicer's semantics. Incremental slice updates are shown to be equivalent to non-incremental ones. Furthermore, we present a framework instantiation based on the concept of edit scripts defining application sequences of model transformation rules. We implemented two concrete model slicers for this instantiation based on the Eclipse Modeling Framework.

## 1 Introduction

Program slicing as introduced by Weiser [1] is a technique which determines those parts of a program (the *slice*) which may affect the values of a set of (user-)selected variables at a specific point (the *slicing criterion*). Since the seminal work of Weiser, which calculates a slice by utilizing static data and control flow analysis and which primarily focuses on assisting developers in debugging, a plethora of program slicing techniques addressing a broad range of use cases have been proposed [2].

With the advent of Model-Driven Engineering (MDE) [3], models rather than source code play the role of primary software development artifacts. Similar use c The Author(s) 2018

cases as known from program slicing must be supported for model slicing [4–6]. In addition to classical use cases adopted from the field of program understanding, model slicing is often motivated by scalability issues when working with very large models [7,8], which has often been mentioned as one of the biggest obstacles in applying MDE in practice [9,10]. Modeling frameworks such as the Eclipse Modeling Framework (EMF) and widely-used model management tools do not scale beyond a few tens of thousands of model elements [11], while large-scale industrial models are considerably larger [12]. As a consequence, such models cannot even be edited in standard model editors. Thus, the *extraction of editable submodels from a larger model* is the only viable solution to support an efficient yet independent editing of huge monolithic models [8]. Further example scenarios in which model slices may be constructed for the sake of efficiency include model checkers, test suite generators, etc., in order to reduce runtimes and memory consumption.

Slice criteria are often modified during software development tasks. This leads to corresponding *slice updates* (also called slice *adaptations* in [8]). During a debugging session, e.g., the slicing criterion might need to be modified in order to closer inspect different debugging hypotheses. The independent editing of submodels is another example of this. Here, a slice created for an initial slicing criterion can turn out to be inappropriate, most typically because additional model elements are desired or because the slice is still too large. These *slice update* scenarios have in common that the original slicing criterion is modified and that the existing slice must be updated w.r.t. the new slicing criterion.

Model slicing is faced with two challenging requirements which do not exist or which are of minor importance for traditional program slicers. First, the increasing importance and prevalence of domain-specific modeling languages (DSMLs) as well as a considerable number of different use cases lead to a huge number of different concrete slicers, examples will be presented in Sect. 2. Thus, methods for developing model slicers should abstract from a slicer's concrete behavior (and thus from concrete modeling languages) as far as possible. Ideally, model slicers should be generic in the sense that the behavior of a slicer is *adaptable* with moderate configuration effort [7]. Second, rather than creating a new slice from scratch for a modified slicing criterion, slices must often be updated *incrementally*. This is indispensable for all use cases where slices are edited by developers since otherwise these slice edits would be blindly overwritten [8]. In addition, incremental slice updating is a desirable feature when it is more efficient than creating the slice from scratch. To date, both requirements have been insufficiently addressed in the literature.

In this paper, we present a fundamental methodology for developing model slicers which abstract from the behavior of a concrete slicer and which support incremental model slicing. To be independent of a concrete DSML and use cases, we restrict ourselves to static slicing in order to support both executable and non-executable models. We make the following contributions:

1. A formal framework for incremental model slicing which can function as a guideline for defining adaptable and incremental model slicers (s. Sect. 3). This framework is based on graph-based models and model modifications and abstracts from the behavior of concrete slicers as well as from the concrete model modification approach. Within this framework we show that incremental slice updates are equivalent to non-incremental ones.

2. An instantiation of this formal framework where incremental model slicers are specified by model patches. Two concrete model slicers.

## 2 Motivating Example

In this section we introduce a running example to illustrate two use cases of model slicing and to motivate incremental slice updates.

Figure 1 shows an excerpt of the system model of the *Barbados Car Crash Crisis Management System (bCMS)* [13]. It describes the operations of a police and a fire department in case of a crisis situation.

Fig. 1. Excerpt of the system model of the bCMS case study [13].

The system is modeled from different viewpoints. The class diagram models the key entities and their relationships from a static point of view. A police station coordinator (PS coordinator) and a fire station coordinator (FS coordinator) are responsible for coordinating and synchronizing the activities on the police and fire station during a crisis. The interaction of both coordinators is managed by the respective system classes PSC System and FSC System which contain several operations for, e.g., establishing the communication between the coordinators and exchanging crisis details. The state machine diagram models the dynamic view of the class PSC System, i.e., its runtime behavior, for sending and receiving authorization credentials and crisis details to and from a FSC System. Initially, the PSC System is in the state Idle. The establishment of the communication can be triggered by calling the operation callFScoordinator or reqComFSC. In the composite state Authorising the system waits for exchanging the credentials of the PS and FS coordinator by calling the operation sendPScoordinatorCredentials and authFSC, or vice versa. On entering the composite state ExchangingCrisisDetails, details can be sent by the operation call sendPSCrisisDetails or details can be received by the operation call crisisDetailsFSC.

*Model Slicing.* Model slicers are used to find parts of interest in a given model M. These parts of M are specified by a *slicing criterion*, which is basically a set of model elements or, more formally, a submodel C of M. A slicer extends C with further model elements of M according to the purpose of the slicer.

We illustrate this with two use cases. Use case A is known as *backward slicing* in state-based models [4]. Given a set of states C in a statechart M as slicing criterion, the slicer determines all model elements which may have an effect on states in C. For instance, using S.1.0.1 (s. gray state in Fig. 1) as slicing criterion, the slicer recursively determines all incoming transitions and their sources, e.g., the transition with the event sendPScoordinatorCredentials and its source state S.1.0.0, until an initial state is reached.

The complete backward slice is indicated by the blue elements in the lower part of Fig. 1. The example shows that our general notion of a slicing criterion may be restricted by concrete model slicers. In this use case, the slicing criterion must not be an arbitrary submodel of a given larger model, but a very specific one, i.e., a set of states.

Use case B is the *extraction of editable models* as presented in [8]. Here, the slicing criterion C is given by a set of *requested model elements* of M. The purpose of this slicer is to find a submodel which is editable and which includes all requested model elements. For example, if we use the blue elements in the lower part of Fig. 1 as slicing criterion, the model slice also contains the orange elements in the upper part of Fig. 1, namely three operations, because events of a transitions in a statechart represent operations in the class diagram, and the class containing these operations.

*Slice Update.* The slicing criterion might be updated during a development task in order to obtain an updated slice. It is often desirable to update the slice rather than creating the new slice from scratch, e.g., because this is more efficient. Let us assume in use case A that the slicing criterion changes from S.1.0.1 to S.1.1.1. The resulting model slice only differs in the contained regions of the composite state Authorising. The upper region and its contained elements would be removed, while the lower region and its contained elements would be added. Next we could use the updated model slice from use case A as slicing criterion in use case B. In the related resulting model slice, the operation sendPScoordinatorCredentials would then be replaced by the operation authFSC.

#### 3 Formal Framework

We have seen in the motivating example that model slicers can differ considerably in their intended purpose. The formal framework we present in the following defines the fundamental concepts for model slicing and slice updates. This framework uses graph-based models and model modifications [14]. It shall serve as a guideline how to define model slicers that support incremental slice updates.

#### 3.1 Models as Graphs

Considering models, especially visual models, their concrete syntax is distinguished from their abstract one. In Fig. 1, a UML model is shown in its concrete representation. In the following, we will reason about their underlying structure, i.e., their abstract syntax, which can be considered as graph. The abstract syntax of a modeling language is usually defined by a meta-model which contains the type information about nodes and edges as well as additional constraints. We assume that a meta-model is formalized by an attributed graph; model graphs are defined as attributed graphs being typed over the meta-model. This typing can be characterized by an attributed graph morphism [15]. In addition, graph constraints [16] may be used to specify additional requirements. Due to space limitations, we do not formalize constraints in this paper.

Definition 1 (Typed model graph and morphism). *Given two attributed graphs* M *and* MM*, called* model *and* meta-model*, the typed model (graph) of* <sup>M</sup> *is defined as* <sup>M</sup><sup>T</sup> = (M, type<sup>M</sup>) *with* type<sup>M</sup> : <sup>M</sup> <sup>→</sup> MM *being an attributed graph morphism, called* typing morphism<sup>1</sup>*. Given two typed models* M *and* N*, an attributed graph morphism* f : M → N *is called* typed model morphism *if* type<sup>N</sup> ◦ <sup>f</sup> <sup>=</sup> type<sup>M</sup>*.*

Fig. 2. Excerpt of a typed model graph.

*Example 1 (Typed model graph).* The left-hand side of Fig. 2 shows the model graph of an excerpt from the model depicted in Fig. 1. The model graph is

<sup>1</sup> In the following, we usually omit the adjective "attributed".

typed over the meta-model depicted on the right-hand side of Fig. 2. It shows a simplified excerpt of the UML meta-model. Every node (and edge) of the model graph is mapped onto a node or edge of the type graph by the graph morphism type : M → MM.

Typed models and morphisms as defined above form the category AGraphsAT G in [15]. It has various properties since it is an adhesive HLR category using a class M of injective graph morphisms with isomorphic data mapping, it has pushouts and pullbacks where at least one morphism is in M. These constructions can be considered as generalized union and intersection of models being defined component-wise on nodes and edges such that they are structure-compatible. These constructions are used to define the formal framework.

## 3.2 Model Modifications

If we do not want to go into any details of model transformation approaches, the temporal change of models is roughly specified by model modifications. Each model modification describes the original model, an intermediate one after having performed all intended element deletions, and the resulting model after having performed all element additions.

Definition 2 (Model modification). *Given two models* M<sup>1</sup> *and* M2*, a (direct)* model modification M<sup>1</sup> =⇒ M<sup>2</sup> *is a span of injective morphisms* M<sup>1</sup> ←− <sup>m</sup><sup>1</sup> M<sup>s</sup> <sup>m</sup><sup>2</sup> −→ <sup>M</sup>2*.*

	- *(a) Model modification* M ←− id*<sup>M</sup>* <sup>M</sup> id*<sup>M</sup>* −→ <sup>M</sup> *is called* identical*.*
	- *(b) Model modification* ∅ ←− ∅ −→ ∅ *is called* empty*.*
	- *(c) Model modification* ∅ ←− ∅ −→ M *is called* model creation*.*
	- *(d) Model modification* M ←− ∅ −→ ∅ *is called* model deletion*.*
	- *(e)* M<sup>2</sup> ←− <sup>m</sup><sup>2</sup> M<sup>s</sup> <sup>m</sup><sup>1</sup> −→ <sup>M</sup><sup>1</sup> *is called* inverse modification *to* <sup>M</sup><sup>1</sup> ←− <sup>m</sup><sup>1</sup> M<sup>s</sup> <sup>m</sup><sup>2</sup> −→ <sup>M</sup>2*.*

In a direct model modification, model M<sup>s</sup> characterizes an intermediate model where all deletion actions have been performed but nothing has been added yet. To this end, M<sup>s</sup> is the intersection of M<sup>1</sup> and M2.

Fig. 3. Excerpt of a model modification

*Example 2 (Direct model modification).* Figure 3 shows a model modification using our running example. While Fig. 3(a) focuses on the concrete model syntax, Fig. 3(b) shows the changing abstract syntax graph. Figure 3(a) depicts an excerpt of the composite state Authorising. The red transition is deleted while the green state and transitions are created. The model modification m : M<sup>1</sup> ←− <sup>m</sup><sup>1</sup> M<sup>s</sup> <sup>m</sup><sup>2</sup> −→ <sup>M</sup><sup>2</sup> is illustrated in Fig. 3(b). The red elements represent the set of nodes (and edges) M<sup>1</sup> \ m1(Ms) to be deleted. The set M<sup>2</sup> \ m2(Ms) describing the nodes (and edges) to be created is illustrated by the green elements. All other nodes (and edges) represent the intermediate model Ms.

The *double pushout* approach to graph transformation [15] is a special kind of model modification:

Definition 3 (Rule application). *Given a model* G *and a model modification* <sup>r</sup> : <sup>L</sup> <sup>l</sup> ←− <sup>K</sup> <sup>r</sup> −→ <sup>R</sup>*, called* rule*, with injective morphism* <sup>m</sup> : <sup>L</sup> <sup>→</sup> <sup>G</sup>*, called* match*, the rule application* G =⇒r,m H *is defined by the following two pushouts:*

Model H is constructed in two passes: (1) D := G \ m(L \ l(K)), i.e., erase all model elements that are to be deleted; (2) H := D ∪ m (R \ r(K)) such that a new copy of all model elements that are to be created is added.

Note that the first pushout above exists if G\m(L\l(K)) does not yield dangling edges [15]. It is obvious that the result of a rule application G =⇒<sup>r</sup> H is a direct model modification G <sup>g</sup> ←− <sup>D</sup> <sup>h</sup>−→ <sup>H</sup>.

#### 3.3 Model Slicing

In general, a model slice is an interesting part of a model comprising a given slicing criterion. It is up to a concrete slicing definition to specify which model parts are of interest.

Definition 4 (Model slice). *Given a model* M *and a* slicing criterion C *with a morphism* c : C → M*. A* model slice S = Slice(M, c) *is a model* S *such that there are two morphisms* m : S → M *and* e : C → S *with* m ◦ e = c*.*

Note that each model slice S = Slice(M, c) induces a model modification C ←− id*<sup>C</sup>* <sup>C</sup> <sup>e</sup> −→ <sup>S</sup>.

Fig. 4. Excerpt of two model slices

*Example 3 (Model slice).* Figure 4 depicts an excerpt of the model graph of M depicted in Fig. 1 and the two slices Sback = Slice(M, cback) and Sedit = Slice(M, cedit). Sback is the backward slice as informally described in Sect. 2. Cback = {S.1.0.1} is the first slice criterion. The embedding cback(Cback) is represented by the gray-filled element while embedding mback(Sback) is represented by the blue-bordered elements. Model eback(Cback) is illustrated by the gray-filled state having a blue border and Sback \ eback(Cback) by the green-filled elements having a blue border.

Let Sback be the slicing criterion for the slice Sedit, i.e. Cedit = Sback and cedit(Cedit) = mback(Sback). Sedit is the extracted editable submodel introduced in Sect. 2 by use case B. Its embedding medit(Sedit) is represented by the blue and orange-bordered elements. Model eedit(Cedit) is illustrated by the blue-bordered elements and Sedit \ eedit(Cedit) by the green-filled elements having an orange border.

#### 3.4 Incremental Slice Update

Throughout working with a model slice, it might happen that the slice criterion has to be modified. The update of the corresponding model slice can be performed incrementally. Actually, modifying slice criteria can happen rather frequently in practice by, e.g., editing independent submodels of a large model in cooperative work.

Definition 5 (Slice update construction). *Given a model slice* S<sup>1</sup> = Slice(M,C<sup>1</sup> → M) *and a direct model modification* c = C<sup>1</sup> ←− <sup>c</sup><sup>1</sup> C<sup>s</sup> <sup>c</sup><sup>2</sup> −→ <sup>C</sup>2*, slice* S<sup>2</sup> = Slice(M,C<sup>2</sup> → M) *can be constructed as follows:*


*All model modifications are concatenated yielding the direct model modification* S1 e ←− <sup>1</sup>◦c<sup>1</sup> C<sup>s</sup> <sup>e</sup>2◦c<sup>2</sup> −→ <sup>S</sup><sup>2</sup> *called* slice update construction *(see also Fig. 6).*

*Example 4 (Slice update example).* Figure 5 illustrates a slice update construction with Sedit = Slice(M,Cedit → M) being the extracted submodel of our previous example illustrated by the red-dashed box. The modification c : Cedit c ←− *edit* C<sup>s</sup> c*edit*- −→ Cedit of the slicing criterion is depicted by the gray-filled elements. The red-bordered elements represent the set Cs\cedit(Cedit) of elements removed from the slicing criterion. The green-bordered elements form the set C<sup>s</sup> \ cedit- (Cedit- ) of elements added to the slicing criterion. Sedit- = Slice(M,Cedit- → M) is the extracted submodel represented by the green-dashed box. Consequently, the slice is updated by deleting all elements in Sedit \ eedit(cedit(Cs)), represented by the red-bordered and red- and white-filled elements, and adding all elements in Sedit- \eedit- (cedit- (Cs)), represented by the green-bordered and green- and whitefilled elements. Note that the white-filled elements are removed and added again. This motivated us to consider incremental slice updates defined below.

Fig. 5. Excerpt of an (incremental) slice update.

Definition 6 (Incremental slice update). *Given* M *and* C<sup>1</sup> → M<sup>1</sup> *as in Definition 4 as well as a direct model modification* C<sup>1</sup> ←− <sup>c</sup><sup>1</sup> C<sup>s</sup> <sup>c</sup><sup>2</sup> −→ <sup>C</sup>2*, model slice* S<sup>1</sup> = Slice(M,C<sup>1</sup> → M) *is incrementally updated to model slice* S<sup>2</sup> = Slice(M,C<sup>2</sup> → M) *yielding a direct model modification* S<sup>1</sup> ←− <sup>s</sup><sup>1</sup> S<sup>s</sup> <sup>s</sup><sup>2</sup> −→ <sup>S</sup>2*, called* incremental slice update *from* S<sup>1</sup> *to* S2*, with* s<sup>1</sup> *and* s<sup>2</sup> *being the pullback of* m<sup>1</sup> : S<sup>1</sup> → M *and* m<sup>2</sup> : S<sup>2</sup> → M *(see also Fig. 6).*

*Example 5 (Incremental slice update example).* Given Sedit and Sedit of our previous example. Furthermore, given the model modification Sedit s ←− *edit* S<sup>s</sup> s*edit*- −→ Sedit whereby S<sup>s</sup> is isomorphic to the intersection of Sedit and Sedit in M, i.e. m<sup>s</sup> : S<sup>s</sup> → medit(Sedit) ∩ medit- (Sedit- ) with m<sup>s</sup> being an isomorphism due to the pullback construction. S<sup>s</sup> is illustrated by the elements contained in the intersection of the red- and green-dashed box in Fig. 5. In contrast to the slice update construction of the previous example the white-filled elements are not affected by the incremental slice update.

Fig. 6. Incremental slice update

Ideally, the slice update construction in Definition 5 should not yield a different update than the incremental one. However, this is not the case in general since the incremental update keeps as many model elements as possible in contrast to the update construction in Definition 5 In any case, both update constructions should be compatible with each other, i.e., should be in an embedding relation, as stated on the following proposition.

Proposition 1 (Compatibility of slice update constructions). *Given* M *and* C<sup>1</sup> *as in Definition 4 as well as a direct model modification* C<sup>1</sup> ←− <sup>c</sup><sup>1</sup> C<sup>s</sup> c2 −→ C2*, the model modification resulting from the slice update construction in Definition 5 can be embedded into the incremental slice update from* S<sup>1</sup> *to* S<sup>2</sup> *(see also Fig. 6).*

Proof idea: Given an incremental slice update S<sup>1</sup> ←− <sup>s</sup><sup>1</sup> S<sup>s</sup> <sup>s</sup><sup>2</sup> −→ <sup>S</sup>2, it is the pullback of m<sup>1</sup> : S<sup>1</sup> → M and m<sup>2</sup> : S<sup>2</sup> → M. The slice update construction yields m<sup>1</sup> ◦ e<sup>1</sup> ◦ c<sup>1</sup> = m<sup>2</sup> ◦ e<sup>2</sup> ◦ c2. Due to pullback properties there is a unique embedding e : C<sup>s</sup> → S<sup>s</sup> with s<sup>1</sup> ◦ e = e<sup>1</sup> ◦ c<sup>1</sup> and s<sup>2</sup> ◦ e = e<sup>2</sup> ◦ c2. 2

## 4 Instantiation of the Formal Framework

In this section, we present an instantiation of our formal framework which is inspired by the model slicing tool introduced in [8]. The basic idea of the approach is to create and incrementally update model slices by calculating and applying a special form of model patches, introduced and referred to as edit script in [17].

<sup>2</sup> This proof idea can be elaborated to a full proof in a straight forward manner.

#### 4.1 Edit Scripts as Refinements of Model Modifications

An *edit script* ΔM1⇒M<sup>2</sup> specifies how to transform a model M<sup>1</sup> into a model M<sup>2</sup> in a stepwise manner. Technically, this is a data structure which comprises a set of rule applications, partially ordered by an acyclic dependency graph. Its nodes are rule applications and its edges are dependencies between them [17]. Models are represented as typed graphs as in Definition 1, rule applications are defined as in Definition 3. Hence, the semantics of an edit script is a set of rule application sequences taking all possible orderings of rule applications into account. Each sequence can be condensed into the application of one rule following the concurrent rule construction in, e.g., [15]. Hence, an edit script Δ<sup>M</sup>1⇒M<sup>2</sup> induces a set of model modifications of the form M<sup>1</sup> ←− <sup>m</sup><sup>1</sup> M<sup>s</sup> <sup>m</sup><sup>2</sup> −→ <sup>M</sup>2.

Given two models M<sup>1</sup> and M<sup>2</sup> as well as a set R of transformation rules for this type of models, edit scripts are calculated in two basic steps [17]:

First, the corresponding elements in M<sup>1</sup> and M<sup>2</sup> are calculated using a model matcher [18]. A basic requirement is that such a matching can be formally represented as a (partial) injective morphism c : M<sup>1</sup> → M2. If so, the matching morphism c yields a unique model modification m : M<sup>1</sup> ⊇ ←− M<sup>s</sup> <sup>m</sup><sup>2</sup> −→ <sup>M</sup><sup>2</sup> (up to isomorphism) with m<sup>2</sup> = c|<sup>M</sup>*<sup>s</sup>* . This means that M<sup>s</sup> always has to be a graph.

Second, an edit script is derived. Elementary model changes can be directly derived from a model matching; elements in M<sup>1</sup> and M<sup>2</sup> which are not involved in a correspondence can be considered as deleted and added, respectively [19]. The approach presented in [17] partitions the set of elementary changes such that each partition represents the application of a transformation rule of the given set R of transformation rules [20], and subsequently calculates the dependencies between these rule applications [17], yielding an edit script Δ<sup>M</sup>1⇒M<sup>2</sup> . Sequences of rule applications of an edit script do not contain transient effects [17], i.e., pairs of change actions which cancel out each other (such as creating and later deleting one and the same element). Thus, no change actions are factored out by an edit script.

#### 4.2 Model Slicing Through Slice-Creating Edit Scripts

Edit scripts are also used to construct new model slices. Given a model M and a slicing criterion C, a *slice-creating edit script* Δ-<sup>⇒</sup><sup>S</sup> is calculated which, when applied to the empty model , yields the resulting slice S. The basic idea to construct Δ-<sup>⇒</sup><sup>S</sup> is to consider the model M as created by an edit script Δ-⇒M applied to the empty model and to identify a sub-script of Δ-<sup>⇒</sup><sup>M</sup> which (at least) creates all elements of C. The slice creating edit script Δ-<sup>⇒</sup><sup>S</sup> consists of the subgraph of the dependency graph of the model-creating edit script Δ-⇒M containing (i) all nodes which create at least one model element in C, and (ii) all required nodes and connecting edges according to the transitive closure of the "required" relation, which is implied by dependencies between rule applications.

Since the construction of edit scripts depends on a given set R of transformation rules, *a basic applicability condition is that all possible models and all possible slices can be created by rules available in* R. Given that this condition is satisfied, model slicing through slice-creating edit scripts indeed behaves according to Definition 4, i.e., a slice S = Slice(M,C → M) is obtained by applying Δ-<sup>⇒</sup><sup>S</sup> to the empty model: The resulting slice S is a submodel of M and a supermodel of C. As we will see in Sect. 5, the behavior of a concrete model slicer and thus its intended purpose is configured by the transformation rule set R.

#### 4.3 Incremental Slicing Through Slice-Updating Edit Scripts

To incrementally update a slice S<sup>1</sup> = Slice(M,C<sup>1</sup> → M) to become slice S<sup>2</sup> = Slice(M,C<sup>2</sup> → M), we show that the approach presented in [8] constructs a *slice-updating edit script* Δ<sup>S</sup>1⇒S<sup>2</sup> which, if applied to the current slice S1, yields S<sup>2</sup> in an incremental way.

Similar to the construction of slice-creating edit scripts, the basic idea is to consider the model M as model-creating edit script Δ-<sup>⇒</sup><sup>M</sup>. The slice-updating edit script must delete all elements in the set S<sup>1</sup> \ S<sup>2</sup> from the current slice S1, while adding all model elements in S<sup>2</sup> \ S1. It is constructed as follows: Let P<sup>S</sup><sup>1</sup> and P<sup>S</sup><sup>2</sup> be the sets of rule applications which create all the elements in S<sup>1</sup> and S2, respectively. Next, the sets Prem and Padd of rule applications in Δ-<sup>⇒</sup><sup>M</sup> are determined with Prem = P<sup>S</sup><sup>1</sup> \ P<sup>S</sup><sup>2</sup> and Padd = P<sup>S</sup><sup>2</sup> \ P<sup>S</sup><sup>1</sup> . Finally, the resulting edit script Δ<sup>S</sup>1⇒S<sup>2</sup> contains (1) the rule applications in set Padd, with the same dependencies as in Δ-<sup>⇒</sup><sup>M</sup>, and (2) for each rule application in Prem, its inverse rule application with reversed dependencies as in Δ-<sup>⇒</sup><sup>M</sup>. By construction, there cannot be dependencies between rule applications in both sets, so they can be executed in arbitrary order.

In addition to the completeness of the set R of transformation rules for a given modeling language (s. Sect. 4.2), *a second applicability condition is that, for each rule* r *in* R, *there must be an inverse rule* r−<sup>1</sup> *which reverts the effect of* r. Given that these conditions are satisfied and a slice-updating edit script Δ<sup>S</sup>1⇒S<sup>2</sup> can be created, its application to S<sup>1</sup> indeed behaves according to the incremental slice update as in Definition 6. This is so because, by construction, none of the model elements in the intersection of S<sup>1</sup> and S<sup>2</sup> in M is deleted by the edit script Δ<sup>S</sup>1⇒S<sup>2</sup> . Consequently, none of the elements in the intersection of C<sup>1</sup> and C<sup>2</sup> in M, which is a subset of S<sup>1</sup> ∩ S2, is deleted.

#### 4.4 Implementation

The framework instantiation has been implemented using a set of standard MDE technologies on top of the widely used Eclipse Modeling Framework (EMF), which employs an object-oriented implementation of graph-based models in which nodes and edges are represented as objects and references, respectively. Edit scripts are calculated using the model differencing framework SiLift [21], which uses EMF Compare [22] in order to determine the corresponding elements in a pair of models being compared with each other. A matching determined by EMF Compare fulfills the requirements presented in Sect. 4.1 since EMF Compare (a) delivers 1:1-correspondences between elements, thus yielding an injective mapping, and (b) implicitly matches edges if their respective source and target nodes are matched and if they have the same type (because EMF does not support parallel edges of the same type in general), thus yielding an edge-preserving mapping. Finally, transformation rules are implemented using the model transformation language and framework Henshin [23,24] which is based on graph transformation concepts.

## 5 Solving the Motivating Examples

In this section, we outline the configurations of two concrete model slicers which are based on the framework instantiation presented in Sect. 4, and which are capable of solving the motivating examples introduced in Sect. 2. Each of these slicers is configured by a set of Henshin transformation rules which are used for the calculation of model-creating, and thus for the construction of slice-creating and slice-updating, edit scripts. The complete rule sets can be found at the accompanying website of this paper [25].

#### 5.1 A State-Based Model Slicer

Two of the creation rules which are used to configure a state-based model slicer as described in our first example of Sect. 2 are shown in Fig. 7. The rules are depicted in an integrated form: the left- and right-hand sides of a rule are merged into a unified model graph following the visual syntax of the Henshin model transformation language [23].

Fig. 7. Subset of the creation rules for configuring a state-based model slicer

Most of the creation rules are of a similar form as the creation rule *createPseudostate*, which simply creates a pseudostate and connects it with an existing container. The key idea of this slicer configuration, however, is the special creation rule *createStateWithTransition*, which creates a state together with an incoming transition in a

Fig. 8. Slice-creating edit script.

single step. To support the incremental updating of slices, for each creation rule an inverse deletion rule is included in the overall set of transformation rules. Parts of the resulting model-creating edit script using these rules are shown in Fig. 8. For example, rule application *p3* creates the state *Idle* in the top-level region of the state machine *PSCSystem*, together with an incoming transition having the initial state of the state machine, created by rule application *p2*, as source state. Thus, *p3* depends on *p2* since the initial state must be created first. Similar dependency relationships arise for the creation of other states which are created together with an incoming transition.

The effect of this configuration on the behavior of the model slicer is as follows (illustrated here for the creation of a new slice): If state *S.1.0.1* is selected as slicing criterion, as in our motivating example, rule application *p7* is included in the slice-creating edit script since it creates that state. Implicitly, all rule applications on which *p7* transitively depends on, i.e., all rule applications *p1* to *p6*, are also included in the slice-creating edit script. Consequently, the slice resulting from applying the slice-creating edit script to an empty model creates a submodel of the state machine of Fig. 1 which contains a transition path from its initial state to state *S.1.0.1*, according to the desired behavior of the slicer.

A current limitation of our solution is that, for each state *s* of the slicing criterion, only a single transition path from the initial state to state *s* is sliced. This path is determined non-deterministically from the set of all possible paths from the initial state to state *s*. To overcome this limitation, rule schemes comprising a kernel rule and a set of multi-rules (see, e.g., [26,27]) would have to be supported by our approach. Then, a rule scheme for creating a state with an arbitrary number of incoming transitions could be included in the configuration of our slicer, which in turn leads to the desired effect during model slicing. We leave such a support for rule schemes for future work.

#### 5.2 A Slicer for Extracting Editable Submodels

In general, editable models adhere to a basic form of consistency which we assume to be defined by the effective meta-model of a given model editor [28]. The basic idea of configuring a model slicer for extracting editable submodels, adopted from [8], is that all creation and deletion rules preserve this level of consistency. Given an effective meta-model, such a rule set can be generated using the approach presented in [28] and its EMF-/UML-based implementation [29,30].

In our motivating example of Sect. 2, for instance, a consistency-preserving creation rule *createTrigger* creates an element of type *Trigger* and immediately connects it to an already existing operation of a class. The operation serves as the *callEvent* of this trigger and needs to be created first, which leads to a dependency in a model-creating edit script. Thus, if a trigger is included in the slicing criterion, the operation serving as *callEvent* of that trigger will be implicitly included in the resulting slice since it is created by the slice-creating edit script.

#### 6 Related Work

A large number of model slicers has been developed. Most of them work only with one specific type of models, notably state machines [4] and other types of behavioral models such as MATLAB/Simulink block diagrams [5]. Other supported model types include UML class diagrams [31], architectural models [32] or system models defined using the SysML modeling language [33]. None of these approaches can be transferred to other (domain-specific) modeling languages, and they do not abstract from concrete slicing specifications.

The only well-known more generally usable technique which is adaptable to a given modeling language and slicing specification is Kompren [7]. In contrast to our formal framework, however, Kompren does not abstract from the concrete model modification approach and implementation technologies. It offers a domain-specific language based on the Kermeta model transformation language [34] to specify the behavior of a model slicer, and a generator which generates a fully functioning model slicer from such a specification. When Kompren is used in the so-called active mode, slices are incrementally updated when the input model changes, according to the principle of incremental model transformation [35]. In our approach, slices are incrementally updated when the slicing criterion is modified. As long as endogenous model transformations for constructing slices are used only, Kompren could be easily extended to become an instantiation of our formal framework.

Incremental slicing has also been addressed in [36], however, using a notion of incrementality which fundamentally differs from ours. The technique has been developed in the context of testing model-based delta-oriented software product lines [37]. Rather than incrementally updating an existing slice, the approach incrementally processes the product space of a product line, where each "product" is specified by a state machine model. As in software regression testing, the goal is to obtain retest information by utilizing differences between state machine slices obtained from different products.

In a broader sense, related work can be found in the area of model splitting and model decomposition. The technique presented in [38] aims at splitting a model into submodels according to linguistic heuristics and using information retrieval techniques. The model decomposition approach presented in [39] considers models as graphs and first determines strongly connected graph components from which the space of possible decompositions is derived in a second step. Both approaches are different from ours in that they produce a partitioning of an input model instead of a single slice. None of them supports the incremental updating of a model partitioning.

#### 7 Conclusion

We presented a formal framework for defining model slicers that support incremental slice updates based on a general concept of model modifications. Incremental slice updates were shown to be equivalent to non-incremental ones. Furthermore, we presented a framework instantiation based on the concept of edit scripts defining application sequences of model transformation rules. This instantiation was implemented by two concrete model slicers based on the Eclipse Modeling Framework and the model differencing framework SiLift.

As future work, we plan to investigate incremental updates of both the underlying model and the slicing criterion. It is also worthwhile to examine the extent to which further concrete model slicers fit into our formal framework of incremental model slicing. For our own instantiation of this framework, we plan to cover further model transformation features such as rule schemes and application conditions, which will make the configuration of concrete model slicers more flexible and enable us to support further use cases and purposes.

Acknowledgments. This work was partially supported by the DFG (German Research Foundation) under the Priority Programme SPP1593: Design For Future - Managed Software Evolution.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Multiple Model Synchronization with Multiary Delta Lenses

Zinovy Diskin1(B) , Harald König<sup>2</sup> , and Mark Lawford<sup>1</sup>

<sup>1</sup> McMaster University, Hamilton, Canada {diskinz,lawford}@mcmaster.ca <sup>2</sup> University of Applied Sciences FHDW Hannover, Hannover, Germany harald.koenig@fhdw.de

Abstract. Multiple (more than 2) model synchronization is ubiquitous and important for MDE, but its theoretical underpinning gained much less attention than the binary case. Specifically, the latter was extensively studied by the bx community in the framework of algebraic models for update propagation called *lenses*. Now we make a step to restore the balance and propose a notion of multiary delta lens. Besides multiarity, our lenses feature *reflective* updates, when consistency restoration requires some amendment of the update that violated consistency. We emphasize the importance of various ways of lens composition for practical applications of the framework, and prove several composition results.

## 1 Introduction

Modelling normally results in a set of inter-related models presenting different views of the system. If one of the models changes and their joint consistency is violated, the related models should also be changed to restore consistency. This task is obviously of paramount importance for MDE, but its theoretical underpinning is inherently difficult and reliable practical solutions are rare. There are working solutions for file synchronization in systems like Git, but they are not applicable in the UML/EMF world of diagrammatic models. For the latter, much work has been done for the binary case (synchronizing two models) by the bidirectional transformation community (bx) [15], specifically, in the framework of so called *delta lenses* [3], but the multiary case (the number of models to be synchronized is <sup>n</sup> <sup>≥</sup> <sup>2</sup>) gained much less attention—cf. the energetic call to the community in a recent Stevens' paper [16].

The context underlying bx is model transformation, in which one model in the pair is considered as a transform of the other even though updates are propagated in both directions (so called round-tripping). Once we go beyond n = 2, we at once switch to a more general context of models inter-relations beyond model-to-model transformations. Such situations have been studied in the context of multiview system consistency, but rarely in the context of an accurate formal basis for update propagation. The present paper can be seen as an adaptation of the (delta) lens-based update propagation framework for the multiview consistency problem. We will call it multi-directional update propagation or *mx* following the bx-pattern. Our contributions to mx are as follows.

We show with a simple example (Sect. 2) an important special feature of mx: consistency restoration may require not only update propagation to other models but the very update created inconsistency should itself be amended (even for the case of a two-view system!); thus, update propagation should, in general, be *reflective*. Moreover, if even consistency can be restored without a reflective amendment, there are cases when such reflection is still reasonable. It means that *Hippocraticness* [15]—a major requirement for the classical bx, may have less weight in the mx world. In Sect. 3, we provide a formal definition of *multiary* (symmetric) lenses with reflection, and define (Sect. 4) several operations of such lens composition producing complex lenses from simple ones. Specifically, we show how n-ary lenses can be composed from n-tuples of asymmetric binary lenses (Theorems 1 and 2), thus giving a partial solution to the challenging issue of building mx synchronization via bx discussed by Stevens in [16]. We consider lens composition results important for practical application of the framework. If the tool builder has implemented a library of elementary synchronization modules based on lenses and, hence, ensuring basic laws for change propagation, then a complex module assembled from elementary lenses will automatically be a lens and thus also enjoys the basic laws.

## 2 Example

We will consider a simple example motivating our framework. Many formal constructs below will be illustrated with the example (or its fragments) and referred to as *Running example*.

Fig. 1. Multi-metamodel in UML

## 2.1 A Multimodel to Play With

Suppose two data sources, whose schemas (we say metamodels) are shown in Fig. 1 as class diagrams M<sup>1</sup> and M<sup>2</sup> that record employment. The first source is interested in employment of people living in downtown, the second one is focused on software companies and their recently graduated employees. In general, population of classes Person and Company in the two sources can be different – they can even be disjoint, but if a recently graduated downtowner works for a software company, her appearance in both databases is very likely. Now suppose there is an agency investigating traffic problems, which maintains its own data on commuting between addresses (see schema M3) computable by an obvious relational join over M<sup>1</sup> and M2. In addition, the agency supervises consistency of the two sources and requires that if they both know a person p and a company c, then they must agree on the employment record (p, c): it is either stored by both or by neither of the sources. For this synchronization, it is assumed that persons and companies are globally identified by their names. Thus, a triple of data sets (we will say models) A1, A2, A3, instantiating the respective metamodels, can be either consistent (if the constraints described above are satisfied) or inconsistent (if they aren't). In the latter case, we normally want to change some or all models to restore consistency. We will call a collection of models to be kept in sync a *multimodel*.

To talk about constraints for multimodels, we need an accurate notation. If A is a model instantiating metamodel M and X is a class in M, we write <sup>X</sup><sup>A</sup> for the set of objects instantiating <sup>X</sup> in <sup>A</sup>. Similarly, if <sup>r</sup> : <sup>X</sup><sup>1</sup> <sup>↔</sup> <sup>X</sup><sup>2</sup> is an association in M, we write r<sup>A</sup> for the corresponding binary relation over X<sup>A</sup> <sup>1</sup> <sup>×</sup> <sup>X</sup><sup>A</sup> <sup>2</sup> . For example, Fig. <sup>2</sup> presents a simple model <sup>A</sup><sup>1</sup> instantiating <sup>M</sup><sup>1</sup> with Person<sup>A</sup><sup>1</sup> <sup>=</sup> {p1, p <sup>1</sup>}, Company<sup>A</sup><sup>1</sup> <sup>=</sup> {c1}, empl-er<sup>A</sup><sup>1</sup> <sup>=</sup> {(p1, c1)}, and similarly for attributes, e.g.,

$$\mathsf{live}^{A\_1} = \{ (p\_1, a1), (p'\_1, a1) \} \subset \mathsf{Person}^{A\_1} \times \mathsf{Add} r$$

(lives<sup>A</sup><sup>1</sup> and also name<sup>A</sup><sup>1</sup> are assumed to be functions and Addr is the (modelindependent) set of all possible addresses). The triple (A1, A2, A3) is a (state of a) multimodel over the multimetamodel (M1, M2, M3), and we say it is *consistent* if the two constraints specified below are satisfied. Constraint (C1) specifies mutual consistency of models A<sup>1</sup> and A<sup>2</sup> in the sense described above; constraint (C2) specifies consistency between the agency's view of data and the two data sources:

$$\mathsf{C}(\mathsf{C1}) \begin{array}{c} \text{if } p \in \mathsf{Person}^{A\_1} \cap \mathsf{Person}^{A\_2} \text{ and } c \in \mathsf{Company}^{A\_1} \cap \mathsf{Company}^{A\_2} \\ \text{then } (p, c) \in \mathsf{empl-er}^{A\_1} \text{ iff } (c, p) \in \mathsf{empl-se}^{A\_2} \end{array}$$

$$\mathbb{P}(\text{C2}) \qquad \left(\text{loves}^{A\_1}\right)^{-1} \mathbb{M}\left(\mathbf{empl-er}^{A\_1} \cup \left(\mathbf{empl-ee}^{A\_2}\right)^{-1}\right) \mathbb{M}\mathbf{located}^{A\_2} \subseteq \mathbf{Commute}^{A\_3}$$

where <sup>−</sup><sup>1</sup> refers to the inverse relations and ✶ denotes relational join (composition); using subsetting rather than equality in (C2) assumes that there are other data sources the agency can use. Note that constraint (C1) inter-relates two component models of the multimodel, while (C2) involves all three components and forces synchronization to be 3-ary.

It is easy to see that multimodel A<sup>1</sup>,2,<sup>3</sup> in Fig. 2 is "two-times" inconsistent: (C1) is violated as both A<sup>1</sup> and A<sup>2</sup> know Mary and IBM, and (IBM, Mary)∈empl-ee<sup>A</sup><sup>2</sup> but (Mary, IBM)∈/ empl-er<sup>A</sup><sup>1</sup> ; (C2) is violated as <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> show a commuting pair (a1, a15) not recorded in A3. We will discuss consistency restoration in the next subsection, but first we need to discuss an important part of the multimodel – traceability or correspondence mappings – held implicit so far.

Fig. 2. A(n inconsistent) multimodel <sup>A</sup>*†* over the multi-metamodel in Fig. <sup>1</sup>

Indeed, classes Person<sup>A</sup><sup>1</sup> and Person<sup>A</sup><sup>2</sup> are interrelated by a correspondence relation linking persons with the same name, and similarly for Company. These correspondence links (we will write corr-links) may be implicit as they can always be restored. More important is to maintain corr-links between Commute<sup>A</sup><sup>3</sup> and empl-er<sup>A</sup><sup>1</sup> <sup>∪</sup>empl-ee<sup>A</sup><sup>2</sup> . Indeed, class Commute together with its two attributes can be seen as a relation, and this relation can be instantiated by a multirelation as people living at the same address can work for companies located at the same address. If some of such Commute-objects is deleted, and this delete is to be propagated to models A<sup>1</sup>,<sup>2</sup>, we need corr-links to know which employment links are to be deleted. Hence, it makes sense to establish such links when objects are added to Commute<sup>A</sup><sup>3</sup> , and use them later for deletion propagation.

Importantly, for given models A<sup>1</sup>,2,<sup>3</sup>, there may be several different correspondence mappings: the same Commute-object can correspond to different commutelinks over A<sup>1</sup> and A2. In fact, multiplicity of possible corr-specifications is a general story that can only be avoided if absolutely reliable keys are available, e.g., if we suppose that persons and companies can always be uniquely identified by names, then corrs between these classes are unique. But if keys (e.g., person names) are not absolutely reliable, we need a separate procedure of model matching or alignment that has to establish whether objects p <sup>1</sup> <sup>∈</sup> Person<sup>A</sup><sup>1</sup> and p <sup>2</sup> <sup>∈</sup> Person<sup>A</sup><sup>2</sup> both named Mary represent the same real world object. Constraints we declared above implicitly involve corr-links, e.g., formula for (C1) is a syntactic sugar for the following formal statement: if there are corr-links <sup>p</sup> = (p1, p2) and <sup>c</sup> = (c1, c2) with <sup>p</sup><sup>i</sup> <sup>∈</sup>Person<sup>A</sup>*<sup>i</sup>* , <sup>c</sup><sup>i</sup> <sup>∈</sup> Company<sup>A</sup>*<sup>i</sup>* (<sup>i</sup> = 1, <sup>2</sup>) then the following holds: (p1, c1) <sup>∈</sup> empl-er<sup>A</sup><sup>1</sup> iff (c2, p2) <sup>∈</sup> empl-ee<sup>A</sup><sup>2</sup> . A precise formal account of this discussion can be found in [10].

Thus, a multimodel is actually a tuple <sup>A</sup> = (A1, A2, A3, R) where <sup>R</sup> is a collection of correspondence relations over sets involved. This R is implicit in Fig. 2 since in this very special case it can be restored. Consistency of a multimodel is a property of the entire 4-tuple <sup>A</sup> rather than its 3-tuple carrier (A1, A2, A3).

#### 2.2 Synchronization via Update Propagation

There are several ways to restore consistency of the multimodel in Fig. 2 w.r.t. constraint (C1). We may delete Mary from A1, or delete its employment with IBM from A2, or even delete IBM from A2. We can also change Mary's employment from IBM to Google, which will restore (C1) as A<sup>1</sup> does not know Google. Similarly, we can delete John's record from A<sup>1</sup> and then Mary's employment with IBM in A<sup>2</sup> would not violate (C1). As the number of constraints and the elements they involve increase, the number of consistency restoration variants grows fast.

The range of possibilities can be essentially decreased if we take into account the history of creating inconsistency and consider not only an inconsistent state A† but update <sup>u</sup>: A→A† that created it (assuming that <sup>A</sup> is consistent). For example, suppose that initially model A<sup>1</sup> contained record (Mary, IBM) (and A<sup>3</sup> contained (a1, a15)-commute), and the inconsistency appears after Mary's employment with IBM was deleted in A1. Then it's reasonable to restore consistency by deleting this employment record in A<sup>2</sup> too; we say that deletion was propagated from A<sup>1</sup> to A<sup>2</sup> (where we assume that initially A<sup>3</sup> contained the commute (a1, a15)). If the inconsistency appears after adding (IBM, Mary)-employment to A2, then it's reasonable to restore consistency by adding such a record to A1. Although propagating deletions/additions to deletions/additions is typical, there are non-monotonic cases too. Let us assume that Mary and John are spouses (they live at the same address), and that IBM follows an exotic policy prohibiting spouses to work together. Then we can interpret addition of (IBM, Mary)-record to A<sup>2</sup> as swapping of the family member working for IBM, and then (John, IBM) is to be deleted from A1.

Now let's consider how updates to and from model A<sup>3</sup> may be propagated. As mentioned above, traceability/correspondence links play a crucial role here. If additions to A<sup>1</sup> or A<sup>2</sup> or both create a new commute, the latter has to be added to A<sup>3</sup> (together with its corr-links) due to constraint (C2). In contrast, if a new commute is added to A3, we change nothing in A<sup>1</sup>,<sup>2</sup> as (C2) only requires inclusion. If a commute is deleted from A3, and it is traced to a corresponding employment in empl-er<sup>A</sup><sup>1</sup> <sup>∪</sup> empl-ee<sup>A</sup><sup>2</sup> , then this employment is deleted. (Of course, there are other ways to remove a commute derivable over A<sup>1</sup> and A2.) Finally, if a commute-generating employment in empl-er<sup>A</sup><sup>1</sup> <sup>∪</sup>empl-ee<sup>A</sup><sup>2</sup> is deleted, the respective commute in A<sup>3</sup> is deleted too. Clearly, many of the propagation policies above although formally correct, may contradict the real world changes and hence should be corrected, but this is a common problem of a majority of automatic synchronization approaches, which have to make guesses in order to resolve non-determinism inherent in consistency restoration.

#### 2.3 Reflective Update Propagation

An important feature of update propagation scenarios above is that consistency could be restored without changing the model whose update caused inconsistency. However, this is not always desirable. Suppose again that violation of constraint (C1) in multimodel in Fig. 2 was caused by adding a new person Mary to A1, e.g., as a result of Mary's moving to downtown. Now both models know both Mary and IBM, and thus either employment record (Mary, IBM) is to be added to A1, or record (IBM, Mary) is to be removed from A2. Either of the variants is possible, but in our context, adding (Mary, IBM) to A<sup>1</sup> seems more likely and less specific than deletion (IBM, Mary) from A2. Indeed, if Mary has just moved to downtown, the data source A<sup>1</sup> simply may not have completed her record yet. Deletion (IBM, Mary) from A<sup>2</sup> seems to be a different event unless there are strong causal dependencies between moving to downtown and working for IBM. Thus, an update policy that would keep A<sup>2</sup> unchanged but amend addition of Mary to A<sup>1</sup> with further automatic adding her employment for IBM (as per model A2) seems reasonable. This means that updates can be reflectively propagated (we also say self-propagated).

Of course, self-propagation does not necessarily mean non-propagation to other directions. Consider the following case: model A<sup>1</sup> initially only contains (John, IBM) record and is consistent with A<sup>2</sup> shown in Fig. 2. Then record (Mary, Google) was added to A1, which thus became inconsistent with A2. To restore consistency, (Mary, Google) is to be added to A<sup>2</sup> (the update is propagated from A<sup>1</sup> to A2) and (Mary, IBM) is to be added to A<sup>1</sup> as discussed above (i.e., addition of (Mary, Google) is amended or self-propagated).

Fig. 3. Update propagation pattern

A general schema of update propagation including reflection is shown in Fig. 3. We begin with a consistent multimodel (A1...An, R)<sup>1</sup> one of which members is updated <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup>. The propagation operation, based on a priori defined propagation policies as sketched above, produces:


To distinguish given data from those produced by the operation, the former are shown with framed nodes and solid lines in Fig. 3 while the latter are nonframed and dashed. Below we introduce an algebraic model encompassing several operations and algebraic laws formally modelling situations considered so far.

## 3 Multidirectional Update Propagation and Delta Lenses

A delta-based mathematical model for bx is well-known under the name of delta lenses; below we will say just *lens*. There are two main variants: asymmetric lenses, when one model is a view of the other and hence does not have any private information, and symmetric lenses, when both sides have their private data not visible on the other side [2,3,6]. In this section we will develop a framework for generalizing the idea for any <sup>n</sup> <sup>≥</sup> <sup>2</sup> and including reflective updates.

<sup>1</sup> Here we first abbreviate (*A*1*,...,An*) by (*A*1*...An*), and then write (*A*1*...An, R*) for ((*A*1*...An*)*, R*). We will apply this style in other similar cases, and write, e.g., *i* ∈ 1*...n* for *i* ∈ {1*, ..., n*} (this will also be written as *i* ≤ *n*).

#### 3.1 Background: Graphs and Categories

We reproduce well-known definitions to fix our notation. A *(directed multi-)graph* G consists of a set G• of *nodes* and a set G of *arrows* equipped with two functions s,t: G- <sup>→</sup> <sup>G</sup>• that give arrow <sup>a</sup> its *source* <sup>s</sup>(a) and *target* <sup>t</sup>(a) nodes. We write <sup>a</sup>: <sup>N</sup> <sup>→</sup> <sup>N</sup> if <sup>s</sup>(a) = <sup>N</sup> and <sup>t</sup>(a) = <sup>N</sup> , and <sup>a</sup>: <sup>N</sup> <sup>→</sup> \_ or <sup>a</sup>: \_ <sup>→</sup> <sup>N</sup> if only one of this conditions is given. Correspondingly, expressions G-(N,N ), G-(N, \_), G-(\_, N ) denote sets of, resp., all arrows from N to N , all arrows from N, and all arrows into N .

A (small) *category* is a graph, whose arrows are associatively composable and every node has a special *identity* loop, which is the unit of the composition. In more detail, given two consecutive arrows <sup>a</sup>1: \_ <sup>→</sup> <sup>N</sup> and <sup>a</sup>2: <sup>N</sup> <sup>→</sup> \_, we denote the composed arrow by a1; a2. The identity loop of node N is denoted by id<sup>N</sup> , and equations a1; id<sup>N</sup> = a<sup>1</sup> and id<sup>N</sup> ; a<sup>2</sup> = a<sup>2</sup> are to hold. A *functor* is a mapping of nodes and arrows from one category to another, which respects sources and targets. Having a tuple of categories (**A**1...**A**n), their *product* is a category **<sup>A</sup>**<sup>1</sup> <sup>×</sup>...<sup>×</sup> **<sup>A</sup>**<sup>n</sup> whose objects are tuples (A1...An) <sup>∈</sup> **<sup>A</sup>**• <sup>1</sup> <sup>×</sup>...<sup>×</sup> **<sup>A</sup>**• <sup>n</sup>, and arrows from (A1...An) to (A 1...A <sup>n</sup>) are tuples of arrows (u1...un) with <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> A <sup>i</sup> for all <sup>i</sup> <sup>∈</sup> <sup>1</sup>...n.

#### 3.2 Model Spaces and Correspondences

Basically, a *model space* is a category, whose nodes are called *model states* or just *models*, and arrows are *(directed) deltas* or *updates*. For an arrow <sup>u</sup>: <sup>A</sup> <sup>→</sup> <sup>A</sup> , we treat A as the state of the model before update u, A as the state after the update, and u as an update specification. Structurally, it is a specification of correspondences between A and A . Operationally, it is an edit sequence (edit log) that changed A to A . The formalism does not prescribe what updates are, but assumes that they form a category, i.e., there may be different updates from state A to state A ; updates are composable; and idle updates idA: <sup>A</sup> <sup>→</sup> <sup>A</sup> (doing nothing) are the units of the composition.

In addition, we require every model space **A** to be endowed with a family (K-- <sup>A</sup> )<sup>A</sup>∈**A**• of binary relations K-- <sup>A</sup> ⊂ **A**-(\_, A) <sup>×</sup> **<sup>A</sup>**-(A, \_) indexed by objects of **A**, and specifying *non-conflicting* or *compatible* consecutive updates. Intuitively, an update u into A is compatible with update u from A, if u does not revert/undo anything done by u, e.g., it does not delete/create objects created/deleted by u, or re-modify attributes modified by u (see [14] for a detailed discussion). Formally, we only require (u, idA)∈K-- <sup>A</sup> and (idA, u )∈K-- <sup>A</sup> for all <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**•, <sup>u</sup>∈**A**-(\_, A) and u ∈**A**-(A, \_).

Definition 1 (Model spaces). *<sup>A</sup>* model space *is a pair* **<sup>A</sup>** = (|**A**|,K-- <sup>A</sup> ) *with* |**A**| *a category (the* carrier*) of models and updates and* K-- <sup>A</sup> *a family as specified above. A* model space functor *from* **<sup>A</sup>** *to* **<sup>B</sup>** *is a functor* <sup>F</sup> : <sup>|</sup>**A**|→|**B**|*, such that* (u, u ) ∈ K-- <sup>A</sup> *implies* (F(u), F(u )) ∈ K-- <sup>B</sup> *. We will denote model spaces and their carriers by the same symbol and often omit explicit mentioning of* K--*.* 

In the sequel, we will work with families of model spaces indexed by a finite set I, whose elements can be seen as space *names*. To simplify notation, we will assume that <sup>I</sup> <sup>=</sup> {1,...,n} although ordering will not play any role in our framework. Given a tuple of model spaces **A**1,..., **A**n, we will refer to objects and arrows of the product category **A**<sup>1</sup> ×···× **A**<sup>n</sup> as *model tuples* and *update tuples* or, sometimes, as *discrete multimodels/multiupdates*.

## Definition 2 (Multispace/Multimodels). *Let* <sup>n</sup> <sup>≥</sup> <sup>2</sup> *be a natural number.*

*(i) An* n-ary multimodel space *or just an* <sup>n</sup>*-ary* multispace *<sup>A</sup> is given by a family of model spaces <sup>∂</sup><sup>A</sup>* = (**A**1,..., **<sup>A</sup>**n) *called the* boundary *of <sup>A</sup>, and a set <sup>A</sup> of elements called* corrs *along with a family of functions* (∂i: *<sup>A</sup>* <sup>→</sup> **A**• <sup>i</sup> )<sup>i</sup>≤<sup>n</sup> *providing every corr* <sup>R</sup> *with its* boundary ∂R = (∂1R...∂nR)*, i.e., a tuple of models taken from the multispace boundary one model per space. Intuitively, a corr is understood as a consistent correspondence specification interrelating models from its boundary (and for this paper, all corrs are assumed consistent).*

*Given a model tuple* (A1...An)*, we write <sup>A</sup>*(A1...An) *for the set of all corrs* R *with* ∂R = (A1...An)*; we call models* A<sup>i</sup> feet *of* R*. Respectively, spaces* **A**<sup>i</sup> *are* feet *of A and we write ∂*i*A for* **A**i*.*

*(ii) An* (aligned consistent) multimodel *over a multispace A is a model tuple* (A1...An) *along with a corr* <sup>R</sup> <sup>∈</sup> *<sup>A</sup>*(A1...An) *relating the models. A* multimodel update <sup>u</sup>: (A1...An, R) <sup>→</sup> (A 1...A n, R ) *is a tuple of updates (*u1: <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>A</sup> <sup>1</sup>*, ...,* <sup>u</sup>n: <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>A</sup> <sup>n</sup>*).* 

Note that any corr R uniquely defines a multimodel via the corr's boundary function <sup>∂</sup>. We will also need to identify the set of all corrs for some fixed <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**• i for a given <sup>i</sup>: *<sup>A</sup>* <sup>i</sup> (A, \_) def = <sup>R</sup> <sup>∈</sup> *<sup>A</sup>* ∂iR = A.

The *Running example* of Sect. 2 gives rise to a 3-ary multimodel space. For <sup>i</sup> <sup>≤</sup> <sup>3</sup>, space **<sup>A</sup>**<sup>i</sup> consists of all models instantiating metamodel <sup>M</sup><sup>i</sup> in Fig. <sup>1</sup> and their updates. To get a consistent multimodel (A1A2A3, R) from that one shown in Fig. 2, we can add to A<sup>1</sup> an empl-er-link connecting Mary to IBM, add to A<sup>3</sup> a commute with from = a1 and to = a15, and form a corr-set R = {(p 1, p 2),(c1, c <sup>2</sup>)} (all other corr-links are derivable from this data).

#### 3.3 Update Propagation and Multiary (Delta) Lenses

Update policies described in Sect. 2 can be extended to cover propagation of all updates <sup>u</sup>i, <sup>i</sup> <sup>∈</sup> <sup>1</sup>...<sup>3</sup> according to the pattern in Fig. 3. This is a non-trivial task, but after it is accomplished, we have the following synchronization structure.

Definition 3 (Symmetric lenses). *An* n*-ary* symmetric lens *is a pair* = (*A*, ppg) *with <sup>A</sup> an* <sup>n</sup>*-ary multispace called the* carrier *of , and* (ppgi)<sup>i</sup>≤<sup>n</sup> *an* n*-tuple of operations of the following arities. Operation* ppg<sup>i</sup> *takes a corr* R *(in fact, a multimodel) with boundary* ∂R = (A1...An)*, and an update* <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> i *as its input, and returns*


*In fact, operations* ppg<sup>i</sup> *complete a local update* u<sup>i</sup> *to an entire multimodel update with components* (u <sup>j</sup> )<sup>j</sup>=<sup>i</sup> *and* <sup>u</sup>i; <sup>u</sup> <sup>i</sup> *(see Fig. 3).* 

Notation. If the first argument R of operation ppg<sup>i</sup> is fixed, the corresponding family of unary operations (whose only argument is ui) will be denoted by ppg<sup>R</sup> i . By taking the jth component of the multi-element result, we obtain single-valued unary operations ppg<sup>R</sup> ij producing, resp. updates <sup>u</sup> <sup>j</sup> = ppg<sup>R</sup> ij (ui): <sup>A</sup> <sup>j</sup> <sup>→</sup> <sup>A</sup> <sup>j</sup> . Note that A <sup>j</sup> <sup>=</sup> <sup>A</sup><sup>j</sup> for all <sup>j</sup> <sup>=</sup> <sup>i</sup> (see clause (a) of the definition) while ppg<sup>R</sup> ii is the reflective update (b). We also have operation ppg<sup>R</sup> i returning a new consistent corr R = ppg<sup>R</sup> i-(ui) according to (c).

Definition 4 (Closed updates). *Given a lens* = (*A*, ppg) *and a corr* <sup>R</sup> <sup>∈</sup> *<sup>A</sup>*(A1...An)*, we call an update* <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup> <sup>R</sup>*-*closed*, if* ppg<sup>R</sup> ii (ui) = id<sup>A</sup>- *i . An update is -*closed *if it is* R*-closed for all* R*. Lens is called* non-reflective *at foot* **A**i*, if all updates in* **A**- <sup>i</sup> *are -closed.* 

For the *Running example*, update propagation policies described in Sect. 2 give rise to a lens non-reflective at space **A**3.

Definition 5 (Well-behavedness). *A lens* = (*A*, ppg) *is called* well-behaved (wb) *if the following laws hold for all* <sup>i</sup> <sup>≤</sup> <sup>n</sup>*,* <sup>A</sup><sup>i</sup> <sup>∈</sup> **<sup>A</sup>**• <sup>i</sup> *,* <sup>R</sup> <sup>∈</sup> *<sup>A</sup>* <sup>i</sup> (Ai, *\_*) *and* <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup>*, cf. Fig. 3.*

$$\begin{array}{ll} (\mathsf{Stability})\_{i} & \forall j \in \{1..n\}: \mathsf{ppg}\_{ij}^{R}(\mathsf{id}\_{A\_{i}}) = \mathsf{id}\_{A\_{j}} \; and & \mathsf{ppg}\_{i\*}^{R}(\mathsf{id}\_{A\_{i}}) = R\\ (\mathsf{Reflect1})\_{i} & (u\_{i}, u\_{i}') \in \mathsf{K}\_{A\_{i}'}^{\mathsf{op}}\\ (\mathsf{Reflect2})\_{i} & \forall j \neq i: \; \mathsf{ppg}\_{ij}^{R}(u\_{i}; u\_{i}') = \mathsf{ppg}\_{ij}^{R}(u\_{i})\\ (\mathsf{Reflect3})\_{i} & \mathsf{ppg}\_{ii}^{R}(u\_{i}; u\_{i}') = \mathsf{id}\_{A\_{i}'}\\ where \; u\_{i}' = \mathsf{ppg}\_{ii}^{R}(u\_{i}) \; as \; in \; Definition 3. \end{array}$$

Stability says that lenses do nothing voluntarily. Reflect1 says that amendment works towards "completion" rather than "undoing", and Reflect2-3 are idempotency conditions to ensure the completion indeed done.

Definition 6 (Invertibility). *A wb lens is called* (weakly) invertible*, if it satisfies the following law for any* <sup>i</sup>*, update* <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup> *and* <sup>R</sup> <sup>∈</sup> *<sup>A</sup>* <sup>i</sup> (Ai, *\_*)*:* (Invert)<sup>i</sup> *for all* <sup>j</sup> <sup>=</sup> <sup>i</sup>*:* ppg<sup>R</sup> ij (ppg<sup>R</sup> ji(ppg<sup>R</sup> ij (ui))) = ppg<sup>R</sup> ij (ui) 

This law deals with "round-tripping": operation ppg<sup>R</sup> ji applied to update <sup>u</sup><sup>j</sup> <sup>=</sup> ppg<sup>R</sup> ij (ui) results in update <sup>u</sup>ˆ<sup>i</sup> equivalent to <sup>u</sup><sup>i</sup> in the sense that ppg<sup>R</sup> ij ( ˆui) = ppg<sup>R</sup> ij (ui) (see [3] for a motivating discussion).

*Example 1 (Identity Lens* (n**A**)*).* Let **A** be an arbitrary model space. It generates an <sup>n</sup>-ary lens (n**A**) as follows: The carrier *<sup>A</sup>* has <sup>n</sup> identical model spaces: **<sup>A</sup>**<sup>i</sup> <sup>=</sup> **<sup>A</sup>** for all <sup>i</sup> ∈ {1, .., n}, it has *<sup>A</sup>* <sup>=</sup> **<sup>A</sup>**•, and boundary functions are identities. All updates are propagated to themselves (hence the name of (n**A**)). Obviously, (n**A**) is a wb, invertible lens non-reflective at all its feet. 

## 4 Compositionality of Update Propagation: Playing Lego with Lenses

We study how lenses can be composed. Parallel constructions are easy to manage and excluded from the paper to save space (they can be found in the long version [1, Sect. 4.1]). More challenging are sequential constructs, in which different lenses share some of their feet, and updates propagated by one lens are taken and propagated further by one or several other lenses. In Sect. 4.1, we consider a rich example of such—*star* composition of lenses. In Sect. 4.2, we study how (symmetric) lenses can be assembled from asymmetric ones.

Since we now work with several lenses, we need a notation for lens' components. Given a lens = (*A*, ppg), we write def <sup>=</sup> *<sup>A</sup>* for its set of corrs. Feet are written *∂* <sup>i</sup> (i-th boundary space) and <sup>∂</sup> <sup>i</sup> <sup>R</sup> for the <sup>i</sup>-th boundary of a corr <sup>R</sup> <sup>∈</sup> . Propagation operations of the lens are denoted by .ppg<sup>R</sup> ij , .ppg<sup>R</sup> i-.

#### 4.1 Star Composition

Running Example Continued. Diagram in Fig. 4 presents a refinement of our example, which explicitly includes relational storage models B<sup>1</sup>,<sup>2</sup> for the two data sources. We assume that object models A<sup>1</sup>,<sup>2</sup> are simple projective views of databases B<sup>1</sup>,<sup>2</sup>: data in A<sup>i</sup> are copied from B<sup>i</sup> without any transformation, while additional tables and attributes that Bi-data may have are excluded from the view Ai; the traceability mappings <sup>R</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> <sup>↔</sup> <sup>B</sup><sup>i</sup> are thus embeddings.


We further assume that synchronization of bases B<sup>i</sup> and their views A<sup>i</sup> is realized by simple *constant-complement* lenses *b*i, i = 1, 2 (see, e.g., [9]). Finally, let *k* be a lens synchronizing models A1, A2, A<sup>3</sup> as described in Sect. 2, and <sup>R</sup> <sup>∈</sup> *<sup>k</sup>* (A1, A2, A3) be a corr for some <sup>A</sup><sup>3</sup> not shown in the figure.

Consider the following update propagation scenario. Suppose that at some moment we have consistency (R1, R, R2) of all five models, and then B<sup>1</sup> is updated with <sup>u</sup>1: <sup>B</sup><sup>1</sup> <sup>→</sup> <sup>B</sup> <sup>1</sup> that, say, adds to B<sup>1</sup> a record of Mary working for Google as discussed in Sect. 2. Consistency is restored with a four-step propagation procedure shown by double-arrows labeled by x : y with x the step number and y the lens doing the propagation. Step 1: lens *b*<sup>1</sup> propagates update u<sup>1</sup> to v <sup>1</sup> that adds (Mary, Google) to view A<sup>1</sup> with no amendment to u<sup>1</sup> as v <sup>1</sup> is just a projection of u1, thus, B <sup>1</sup> = B <sup>1</sup> . Note also the updated traceability mapping R <sup>1</sup> : B <sup>1</sup> <sup>↔</sup> <sup>A</sup> <sup>1</sup>. Step 2: lens *k* propagates v <sup>1</sup> to v <sup>2</sup> that adds (Google, Mary) to A2, and amends v <sup>1</sup> with v <sup>1</sup> that adds (Mary, IBM) to A <sup>1</sup>; a new consistent corr R is also computed. Step 3: lens *b*<sup>2</sup> propagates v <sup>2</sup> to u <sup>2</sup> that adds Mary's employment by Google to B<sup>2</sup> with, perhaps, some other specific relational storage changes not visible in A2. We assume no amendment to v <sup>2</sup> as otherwise access to relational storage would amend application data, and thus we have a consistent corr R <sup>2</sup> as shown. Step 4: lens *b*<sup>1</sup> maps update v <sup>1</sup> (see above in Step 2) backward to u <sup>1</sup> that adds (Mary, IBM) to B <sup>1</sup> so that B <sup>1</sup> includes both (Mary, Google) and (Mary, IBM) and a respective consistent corr R <sup>1</sup> is provided. There is no amendment for v <sup>1</sup> by the same reason as in Step 3.

Thus, all five models in the bottom line of Fig. 4 (A <sup>3</sup> is not shown) are mutually consistent and all show that Mary is employed by IBM and Google. Synchronization is restored, and we can consider the entire scenario as propagation of u<sup>1</sup> to u <sup>2</sup> and its amendment with u <sup>1</sup> so that finally we have a consistent corr (R <sup>1</sup> , R, R <sup>2</sup> ) interrelating B <sup>1</sup> , A <sup>3</sup> , B <sup>2</sup> . Amendment u <sup>1</sup> is compatible with u<sup>1</sup> as nothing is undone and condition (u1, u <sup>1</sup> ) ∈ K-- B- 1 holds; the other two equations required by Reflect2-3 for the pair (u1, u <sup>1</sup> ) also hold. For our simple projection views, these conditions will hold for other updates too, and we have a well-behaved propagation from B<sup>1</sup> to B<sup>2</sup> (and trivially to A3). Similarly, we have a wb propagation from B<sup>2</sup> to B<sup>1</sup> and A3. Propagation from A<sup>3</sup> to B<sup>1</sup>,<sup>2</sup> is non-reflective and done in two steps: first lens *k* works, then lenses *b*<sup>i</sup> work as described above (and updates produced by *k* are *b*i-closed). Thus, we have built a wb ternary lens synchronizing spaces **B**1, **B**<sup>2</sup> and **A**<sup>3</sup> by joining lenses *b*<sup>1</sup> and *b*<sup>2</sup> to the central lens *k* .

Discussion. Reflection is a crucial aspect of lens composition. The inset figure describes the scenario above as a transition system and shows that Steps 3 and 4 can go concurrently. It is the non-trivial amendment created in Step 2 that causes the necessity of Step 4, otherwise Step 3 would finish consis-

tency restoration (with Step 4 being an idle transition). On the other hand, if update v <sup>2</sup> in Fig. 4 would not be closed for lens *b*2, we'd have yet another concurrent step complicating the scenario. Fortunately for our example with simple projective views, Step 4 is simple and provides a non-conflicting amendment, but the case of more complex views beyond the constant-complement class needs care and investigation. Below we specify a simple situation of lens composition with reflection a priori excluded, and leave more complex cases for future work.

Formal Definition. Suppose we have an nary lens *<sup>k</sup>* = (*A*, ppg), and for every <sup>i</sup> <sup>≤</sup> <sup>n</sup>, a binary lens *b*<sup>i</sup> = (**A**i, **B**i, *b*i.ppg), with the first model space **A**<sup>i</sup> being the ith model space of *k* (see Fig. 5, where *k* is depicted in the center and *b*<sup>i</sup> are shown as ellipses adjoint to *k* 's feet). We also assume the following *Junction conditions: For any* <sup>i</sup> <sup>≤</sup> <sup>n</sup>, *all updates propagated to* **<sup>A</sup>**<sup>i</sup> *by lens b*<sup>i</sup> *are k -closed, and all updates propagated to* **A**<sup>i</sup> *by lens k are b*i*-closed.*

Fig. 5. Star composition

Below we will write a corr <sup>R</sup><sup>i</sup> <sup>∈</sup> *<sup>b</sup>* <sup>i</sup> (Ai, Bi) as <sup>R</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> <sup>↔</sup> <sup>B</sup>i, and the sixtuple of operations *b*i.ppgR*<sup>i</sup>* as the family *b*i.ppgR*<sup>i</sup>* xy <sup>|</sup> <sup>x</sup> ∈ {**A**, **<sup>B</sup>**}, y ∈ {**A**, **<sup>B</sup>**, } . Likewise we write ∂*b<sup>i</sup>* <sup>x</sup> with <sup>x</sup> ∈ {**A**, **<sup>B</sup>**} for the boundary functions of lenses *<sup>b</sup>*i.

The above configuration gives rise to the following n-ary lens . The carrier is the tuple of model spaces **<sup>B</sup>**1...**B**<sup>n</sup> and corrs are tuples (R, R1...Rn) with <sup>R</sup> <sup>∈</sup> *<sup>k</sup>* and <sup>R</sup><sup>i</sup> <sup>∈</sup> *<sup>b</sup>* <sup>i</sup> , such that <sup>∂</sup>*<sup>k</sup>* <sup>i</sup> <sup>R</sup> <sup>=</sup> <sup>∂</sup>*b<sup>i</sup>* **<sup>A</sup>**R<sup>i</sup> for all <sup>i</sup> <sup>∈</sup> <sup>1</sup>..n. Moreover, we define ∂ <sup>i</sup> (R, R1...Rn) def = ∂*b<sup>i</sup>* **<sup>B</sup>**R<sup>i</sup> (see Fig. 5). Operations are defined as compositions of consecutive lens' executions as described below (we will use the dot notation for operation application and write x.op for op(x), where x is an argument).

Given a model tuple (B1...Bn) <sup>∈</sup> **<sup>B</sup>**<sup>1</sup> <sup>×</sup>...<sup>×</sup> **<sup>B</sup>**n, a corr (R, R1...Rn), and update <sup>v</sup>i: <sup>B</sup><sup>i</sup> <sup>→</sup> <sup>B</sup> <sup>i</sup> in **B**- <sup>i</sup> , we define, first for <sup>j</sup> <sup>=</sup> <sup>i</sup>,

$$v\_{i}.\ \ell.\mathsf{ppg}\_{ij}^{(R,R\_{1}\ldots R\_{n})} \stackrel{\text{def}}{=} v\_{i}.(\delta\_{i}.\mathsf{ppg}\_{\mathbf{B}\mathbf{A}}^{R\_{i}}).(\ell.\mathsf{ppg}\_{ij}^{R}).(\theta\_{j}.\mathsf{ppg}\_{\mathbf{A}\mathbf{B}}^{R\_{j}}),$$

and vi. .ppg(R,R1...R*n*) ii def = vi. *b*i.ppg<sup>R</sup>*<sup>i</sup>* **BB** for <sup>j</sup> <sup>=</sup> <sup>i</sup>. Note that all internal amendments to u<sup>i</sup> = vi.(*b*i.ppg<sup>R</sup>*<sup>i</sup>* **BA**) produced by *<sup>k</sup>* , and to <sup>u</sup> <sup>j</sup> <sup>=</sup> <sup>u</sup>i.(*<sup>k</sup>* .ppg<sup>R</sup> ij ) produced by *b*<sup>j</sup> , are identities due to the Junction conditions. This allows us to set corrs properly and finish propagation with the three steps above: vi. .ppg(R,R1...R*n*) i- def = (R , R 1...R <sup>n</sup>) where R = ui. *k* .ppg<sup>R</sup> i-, R <sup>j</sup> <sup>=</sup> <sup>u</sup> <sup>j</sup> . *<sup>b</sup>*<sup>j</sup> .ppg R*j* **A**- for <sup>j</sup> <sup>=</sup> <sup>i</sup>, and <sup>R</sup> <sup>i</sup> <sup>=</sup> <sup>v</sup>i. *<sup>b</sup>*i.ppg<sup>R</sup>*<sup>i</sup>* **B**-. We thus have a lens denoted by *k* -(*b*1,..., *b*n). 

Theorem 1 (Star Composition). *Given a star configuration of lenses as above, if lens k fulfills Stability, all lenses b*<sup>i</sup> *are wb, and Junction conditions hold, then the composed lens k* -(*b*1,..., *b*n) *defined above is wb, too.*

*Proof.* Laws Stability and Reflect1 for the composed lens are straightforward. Reflect2-3 also follow immediately, since the first step of the above propagation procedure already enjoys idempotency by Reflect2-3 for *b*i. 

#### 4.2 Assembling *n*-ary Lenses from Binary Lenses

This section shows how to assemble n-ary (symmetric) lenses from binary asymmetric lenses modelling view computation [2]. As the latter is a typical bx, the well-behavedness of asymmetric lenses has important distinctions from wellbehavedness of general (symmetric mx-tailored) lenses.

Definition 7 (Asymmetric Lens, cf. [2]). *An* asymmetric lens (a-lens) *is a tuple b* = (**A**, **B**, get, put) *with* **A** *a model space called the* (abstract) view*,* **B** *a model space called the* base*,* get: **A** ← **B** *a functor (read "get the view"), and* put *a family of operations* put<sup>B</sup> <sup>|</sup> <sup>B</sup> <sup>∈</sup> **<sup>B</sup>**• *(read "put the view update back") of the following arity. Provided with a view update* <sup>v</sup>: get(B) <sup>→</sup> <sup>A</sup> *at the input, operation* put<sup>B</sup> *outputs a base update* put<sup>B</sup> b (v) = <sup>u</sup> : <sup>B</sup> <sup>→</sup> <sup>B</sup> *and a reflected view update* put<sup>B</sup> v (v) = <sup>v</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> *such that* <sup>A</sup> <sup>=</sup> get(B)*. A view update* <sup>v</sup>: get(B) <sup>→</sup> <sup>A</sup> *is called* closed *if* put<sup>B</sup> v (v) = id<sup>A</sup>-*.*  The following is a specialization of Definition 5.

Definition 8 (Well-behavedness). *An a-lens is* well-behaved (wb) *if it satisfies the following laws for all* <sup>B</sup> <sup>∈</sup> **<sup>B</sup>**• *and* <sup>v</sup>: get(B) <sup>→</sup> <sup>A</sup> (Stability) put<sup>B</sup> b (idget(B)) = id<sup>B</sup> (Reflect0) put<sup>B</sup> v (v) <sup>=</sup> idA *implies* <sup>A</sup> <sup>=</sup> get(X) *for all* <sup>X</sup> <sup>∈</sup> **<sup>B</sup>**• (Reflect1) (v, v ) ∈ K-- A-

(Reflect2) put<sup>B</sup> b (v; put<sup>B</sup> v (v)) = put<sup>B</sup> b (v)

(PutGet) v; put<sup>B</sup> v (v) = get(put<sup>B</sup> b (v)) 

In contrast to the general lens case, a wb a-lens features Reflect0—a sort of self-Hippocraticness important for bx. Another distinction is inclusion of a strong invertibility law PutGet into the definition of well-behavedness: Put-Get together with Reflect2 provide (weak) invertibility: put<sup>B</sup> b (get(put<sup>B</sup> b (v))) = put<sup>B</sup> b (v). Reflect3 is omitted as it is implied by Reflect0 and PutGet.

Any a-lens *b* = (**A**, **B**, get, put) gives rise to a binary symmetric lens *b*. Its carrier consists of model spaces **A** and **B**. Furthermore *b* = **B**• with boundary mappings defined as follows: for <sup>R</sup> <sup>∈</sup> *<sup>b</sup>* <sup>=</sup> **<sup>B</sup>**•, <sup>∂</sup>*<sup>b</sup>* **<sup>A</sup>**<sup>R</sup> <sup>=</sup> get(R) and <sup>∂</sup>*<sup>b</sup>* **<sup>B</sup>**<sup>R</sup> <sup>=</sup> <sup>R</sup>. Thus, the set of corrs *<sup>b</sup>*(A, B) is {B} if <sup>A</sup> <sup>=</sup> get(B), and is empty otherwise.

For a corr B, we need to define six operations *b*.ppg<sup>B</sup> \_\_. If <sup>v</sup>: <sup>A</sup> <sup>→</sup> <sup>A</sup> is a view update, then ppg<sup>B</sup> **AB**(v) = put<sup>B</sup> b (v) : <sup>B</sup> <sup>→</sup> <sup>B</sup>, ppg<sup>B</sup> **AA**(v) = put<sup>B</sup> v (v) : <sup>A</sup> <sup>→</sup> <sup>A</sup>, and ppg<sup>B</sup> **A**-(v) = B. The condition A = get(B) for *b* means that B is again a consistent corr with the desired boundaries. For a base update <sup>u</sup>: <sup>B</sup> <sup>→</sup> <sup>B</sup> and corr B, ppg<sup>B</sup> **BA**(u) = get(u), ppg<sup>B</sup> **BB**(u) = id<sup>B</sup>- , and ppg<sup>B</sup> **B**-(u) = B . Functoriality of get yields consistency of B .

Lemma 1. *Let b be a wb a-lens and b the corresponding symmetric lens. Then all base updates of b are closed, and b is wb and invertible.*

*Proof.* Base updates are closed by the definition of ppg**BB**. Well-behavedness follows from wb-ness of *b*. Invertibility has to be proved in two directions: ppg**BA**; ppg**AB**; ppg**BA** = ppg**BA** follows from (PutGet) and (Reflect0), the other direction follows from (PutGet) and (Reflect2), see the remark after Definition 8. 

Theorem 2 (Lenses from Spans). *An* n*-ary span of wb a-lenses b* <sup>i</sup> = (**A**i, **B**, geti, puti)*,* i = 1..n *with common base* **B** *of all b* <sup>i</sup> *gives rise to a wb (symmetric) lens denoted by* Σ<sup>n</sup> i=1*b* i *.*

*Proof.* An n-ary span of a-lenses *b* <sup>i</sup> (all of them interpreted as symmetric lenses *b*<sup>i</sup> as explained above) is a construct equivalent to the star-composition of Definition 4.1.3, in which lens *k* = (n**B**) (cf. Example 1) and peripheral lenses are lenses *b*i. The junction condition is satisfied as all base updates are *b*i-closed for all i by Lemma 1, and also trivially closed for any identity lens. The theorem thus follows from Theorem 1. Note that a corr in (Σ<sup>n</sup> i=1*b* <sup>i</sup> ) is nothing but a single model <sup>B</sup> <sup>∈</sup> **<sup>B</sup>**• with boundaries being the respective geti-images. 

The theorem shows that combining a-lenses in this way yields an n-ary symmetric lens, whose properties can automatically be inferred from the binary a-lenses.

*Running example*. Figure 6 shows a metamodel M<sup>+</sup> obtained by merging the three metamodels M1,2,<sup>3</sup> from Fig. 1 without loss and duplication of information. In addition, for persons and companies, the identifiers of model spaces, in which a given person or company occurs, can be traced back via attribute "spaces" (Commute-objects are known to appear in space **A**<sup>3</sup> and hence do not need such an attribute). As shown in [10], any consistent multimodel (A1...An, R) can be merged into a comprehensive model A<sup>+</sup> instantiating M<sup>+</sup>. Let **B** be the space of such together with their comprehensive updates <sup>u</sup><sup>+</sup>: <sup>A</sup><sup>+</sup> <sup>→</sup> <sup>A</sup><sup>+</sup>.

For a given <sup>i</sup> <sup>≤</sup> <sup>3</sup>, we can define the following a-lens *b* <sup>i</sup> = (**A**i, **<sup>B</sup>**, geti, puti): get<sup>i</sup> takes update u<sup>+</sup> as above and outputs its restriction to the model containing only objects recorded in space **A**i. Operation put<sup>i</sup> takes an update <sup>v</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup> and first propagates it to all directions as discussed in Sect. 2, then merges these propagated local updates into a comprehensive


Fig. 6. Merged metamodel

**B**-update between comprehensive models. This yields a span of a-lenses that implements the same synchronization behaviour as the symmetric lens discussed in Sect. 2.

*From lenses to spans.* There is also a backward transformation of (symmetric) lenses to spans of a-lenses. Let = (*A*, ppg) be a wb lens. It gives rise to the following span of wb a-lenses <sup>i</sup> = (*∂*i(*A*), **<sup>B</sup>**, geti, puti) where space **<sup>B</sup>** is built from consistent multimodels and their updates, and functors get<sup>i</sup> : **B** → **A**<sup>i</sup> are projection functors. Given <sup>B</sup> = (A1...An, R) and update <sup>u</sup>i: <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> <sup>i</sup>, let

$$\mathfrak{put}\_{i\mathbf{b}}^{B}(u\_{i}) \stackrel{\text{def}}{=} (u'\_{1}, \ldots, u'\_{i-1}, (u\_{i}; u'\_{i}), u'\_{i+1}, \ldots, u'\_{n}) \colon (A\_{1} \ldots A\_{n}, R) \to (A\_{1}^{\prime\prime} \ldots A\_{n}^{\prime\prime}, R^{\prime\prime}) $$

where u j def = ppg<sup>R</sup> ij (ui) (all <sup>j</sup>) and <sup>R</sup> <sup>=</sup> ppg<sup>R</sup> i-(ui). Finally, put<sup>B</sup> <sup>i</sup>v(vi) def = ppg<sup>R</sup> ii (ui). Validity of Stability, Reflect0-2, PutGet directly follows from the above definitions.

An open question is whether the span-to-lens transformation in Theorem 2 and the lens-to-span transformation described above are mutually inverse. The results for the binary case in [8] show that this is only the case modulo certain equivalence relations. These equivalences may be different for our reflective multiary lenses, and we leave this important question for future research.

### 5 Related Work

For state-based lenses, the work closest in spirit is Stevens' paper [16]. Her and our goals are similar, but the technical realisations are different even besides the state- vs. delta-based opposition. Stevens works with restorers, which take a multimodel (in the state-based setting, just a tuple of models) presumably *inconsistent*, and restores consistency by changing some models in the tuple while keeping other models (from the *authority set*) unchanged. In contrast, lenses take a *consistent* multimodel *and* updates, and return a consistent multimodel and updates. Also, update amendments are not considered in [16] – models in the authority set are intact.

Another distinction is how the multiary vs. binary issue is treated. Stevens provides several results for decomposing an n-ary relation *<sup>A</sup>* into binary relations *<sup>A</sup>* ij ⊆ **A**<sup>i</sup> × **A**<sup>j</sup> between the components. For us, a relation is a span, i.e., a set *<sup>A</sup>* endowed with an <sup>n</sup>-tuple of projections <sup>∂</sup>i: *<sup>A</sup>* <sup>→</sup> **<sup>A</sup>**<sup>i</sup> uniquely identifying elements in *<sup>A</sup>*. Thus, while Stevens considers "binarisation" of a relation <sup>R</sup> over its boundary A1...An, we "binarise" it via the corresponding span (the UML would call it reification). Our (de)composition results demonstrate advantages of the span view. Discussion of several other works in the state-based world, notably by Macedo *et al*. [12] can be found in [16].

Compositionality as a fundamental principle for building synchronization tools was proposed by Pierce and his coauthors, and realized for several types of binary lenses in [4,6,7]. In the delta-lens world, a fundamental theory of equivalence of symmetric lenses and spans of a-lenses (for the binary case) is developed by Johnson and Rosebrugh [8], but they do not consider reflective updates. The PutGetPut law has been discussed (in a different context of state-based asymmetric injective editing) in several early bx work from Tokyo, e.g., [13]. A notion close to our update compatibility was proposed by Orejas *et al* in [14]. We are not aware of multiary update propagation work in the delta-lens world. Considering amendment and its laws in the delta lens setting is also new.

In [11], Königs and Schürr introduced multigraph grammars (MGGs) as a multiary version of well-known triple graph grammar (TGG). Their multidomain-integration rules specify how all involved graphs evolve simultaneously. The idea of an additional correspondence graph is close to our consistent corrs. However, their scenarios are specialized towards (1) directed graphs, (2) MOFcompliant artifacts like QVT, and (3) the global consistency view on a multimodel rather than update propagation.

#### 6 Conclusions and Future Work

We have considered multiple model synchronization via multi-directional update propagation, and argued that reflective propagation to the model whose change originated inconsistency is a reasonable feature of the scenario. We presented a mathematical framework for such synchronization based on a multiary generalisation of binary symmetric delta lenses introduced earlier in [3], and enriched it with reflective propagation. Our lens composition results make the framework interesting for practical applications, but so far it has an essential limitation: we consider consistency violation caused by only one model change, and thus consistency is restored by propagating only one update, while in practice we often deal with several models changing concurrently. If these updates are in conflict, consistency restoration needs conflict resolution, and hence an essential development of the framework.

There are also several open issues for the non-concurrent case considered in the paper (and its future concurrent generalisation). First, our pool of lens composition constructs is far incomplete (because of both space limitations and the necessity of further research). We need to enrich it with (i) sequential composition of (reflective) a-lenses so that a category of a-lenses could be built, and (ii) a relational composition of symmetric lenses sharing several of their feet (similar to relational join). It is also important to investigate composition with weaker junction conditions than we considered. Another important issue is invertibility, which nicely fits in some but not all of our results, which shows the necessity of further investigation. It is a sign that we do not well understand the nature of invertibility. We conjecture that while invertibility is essential for bx, its role for mx may be less important. The (in)famous PutPut law is also awaiting its exploration in the case of multiary reflective propagation. And the last but not the least is the (in)famous PutPut law: how well our update propagation operations are compatible with update composition is a very important issue to explore. Finally, paper [5] shows how binary delta lenses can be implemented with TGG, and we expect that MGG could play a similar role for multiary delta lenses.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Controlling the Attack Surface of Object-Oriented Refactorings**

Sebastian Ruland1(B) , G´eza Kulcs´ar<sup>1</sup> , Erhan Leblebici<sup>1</sup> , Sven Peldszus<sup>2</sup> , and Malte Lochau<sup>1</sup>

<sup>1</sup> Real-Time Systems Lab, TU Darmstadt, Darmstadt, Germany *{*sebastian.ruland,geza.kulcsar,erhan.leblebici, malte.lochau*}*@es.tu-darmstadt.de <sup>2</sup> Institute for Software Technology, University of Koblenz-Landau, Koblenz, Germany speldszus@uni-koblenz.de

**Abstract.** Refactorings constitute an effective means to improve quality and maintainability of evolving object-oriented programs. Search-based techniques have shown promising results in finding optimal sequences of behavior-preserving program transformations that (1) maximize codequality metrics and (2) minimize the number of changes. However, the impact of refactorings on extra-functional properties like security has received little attention so far. To this end, we propose as a further objective to minimize the attack surface of programs (i.e., to maximize strictness of declared accessibility of class members). Minimizing the attack surface naturally competes with applicability of established *MoveMethod* refactorings for improving coupling/cohesion metrics. Our tool implementation is based on an EMF meta-model for Java-like programs and utilizes MOMoT, a search-based model-transformation framework. Our experimental results gained from a collection of real-world Java programs show the impact of attack surface minimization on design-improving refactorings by using different accessibility-control strategies. We further compare the results to those of existing refactoring tools.

## **1 Introduction**

The essential activity in designing object-oriented programs is to identify class candidates and to assign *responsibility* (i.e., data and operations) to them. An appropriate solution to this *Class-Responsibility-Assignment (CRA)* problem, on the one hand, intuitively reflects the problem domain and, on the other hand, exhibits acceptable quality measures [4]. In this context, *refactoring* has become a key technique for agile software development: productive program-evolution phases are interleaved with behavior-preserving code transformations for updating CRA decisions, to proactively maintain, or even improve, code-quality metrics [13,29]. Each refactoring pursues a trade-off between two major, and generally contradicting, objectives: (1) maximizing code-quality metrics, including fine-grained coupling/cohesion measures as well as coarse-grained anti-pattern

A. Russo and A. Sch¨urr (Eds.): FASE 2018, LNCS 10802, pp. 38–55, 2018. https://doi.org/10.1007/978-3-319-89363-1\_3

avoidance, and (2) minimizing the number of changes to preserve the initial program design as much as possible [8]. Manual search for refactorings sufficiently meeting both objectives becomes impracticable already for medium-size programs, as it requires to find optimal sequences of interdependent code transformations with complex constraints [10]. The very large search space and multiple competing objectives make the underlying optimization problem well-suited for search-based optimization [15] for which various semi-automated approaches for recommending refactorings have been recently proposed [18,27,28,30,34].

The validity of proposed refactorings is mostly concerned with purely *functional* behavior preservation [24], whereas their impact on *extra-functional* properties like program security has received little attention so far [22]. However, applying elaborated information-flow metrics for identifying security-preserving refactorings is computationally too expensive in practice [36]. As an alternative, we consider *attack-surface metrics* as a sufficiently reliable, yet easy-tocompute indicator for preservation of program security [20,41]. *Attack surfaces* of programs comprise all conventional ways of entering a software by users/attackers (e.g., invoking API methods or inheriting from super-classes) such that an unnecessarily large surface increases the danger of exploiting vulnerabilities. Hence, the goal of a secure program design should be to grant least privileges to class members to reduce the extent to which data and operations are exposed to the world [41]. In Java-like languages, accessibility constraints by means of modifiers public, private and protected provide a built-in low-level mechanism for controlling and restricting information flow within and across classes, sub-classes and packages [38]. Accessibility constraints introduce compile-time security barriers protecting trusted system code from untrusted mobile code [19]. As a downside, restricted accessibility privileges naturally obstruct possibilities for refactorings, as CRA updates (e.g., moving members [34]) may be either rejected by those constraints, or they require to relax accessibility privileges, thus increasing the attack surface [35].

In this paper, we present a search-based technique to find optimal sequences of refactorings for object-oriented Java-like programs, by explicitly taking accessibility constraints into account. To this end, we do not propose novel refactoring operations, but rather apply established ones and control their impact on attack-surface metrics. We focus on *MoveMethod* refactorings which have been proven effective for improving CRA metrics [34], in combination with operations for on-demand strengthening and relaxing of accessibility declarations [38]. As objectives, we consider **(O1)** *elimination of design flaws*, particularly, **(O1a)** optimization of object-oriented coupling/cohesion metrics [5,6] and **(O1b)** avoidance of anti-patterns, namely *The Blob*, **(O2)** *preservation of original program design* (i.e., minimizing the number of change operations), and **(O3)** *attack-surface minimization*. Our model-based tool implementation, called GOBLIN, represents individuals (i.e., intermediate refactoring results) as program-model instances complying to an EMF meta-model for Java-like programs [33]. Hence, instead of regenerating source code after every single refactoring step, we apply and evaluate sequences of refactoring operations, specified as model-transformation rules in Henshin [2], on the program model. To this end,

**Fig. 1.** UML class diagram of MailApp

we apply MOMoT [11], a generic framework for search-based model transformations. Our experimental evaluation results gained from applying GOBLIN as well as the recent tools JDeodorant [12] and Code-Imp [27] to a collection of real-world Java programs provide us with in-depth insights into the subtle interplay between traditional code-quality metrics and attack-surface metrics. Our tool and all experiment results are available on the GitHub site of the project<sup>1</sup>.

## **2 Background and Motivation**

We first introduce a running example to provide the necessary background and to motivate the proposed refactoring methodology.

*Running Example.* We consider a (simplified) e-mail client, called MailApp, implemented in Java. Figure 1 shows the UML class diagram of MailApp, where security-critical extensions (in gray) will be described below. We use stereotype -pkg : name to annotate classes with package declarations. Central class MailApp is responsible for handling objects of classes Message and Contact both encapsulating application data and operations to access those attributes. The text of a message may be formatted as plain String, or it may be converted into HTML using method plainToHtml().

*Design Flaws in Object-Oriented Programs.* The over-centralized architectural design of MailApp, consisting of a predominant *controller class* (MailApp) intensively accessing inactive *data classes* (Message and Contact), is frequently referred to as *The Blob* anti-pattern [7]. As a consequence, method plainToHtml() in class MailApp frequently calls method getPlainText() in class Message across

<sup>1</sup> https://github.com/Echtzeitsysteme/goblin.

class- and even package-boundaries. *The Blob* and other *design flaws* are widely considered harmful with respect to software quality in general and program maintainability in particular [7]. For instance, assume a developer to extend MailApp by (1) adding further classes SecureMailApp and RsaAdapter for encrypting and signing messages, and by (2) extending class Contact with public RSA key handling: method findKey() searches for public RSA keys of contacts by repeatedly calling method findKeyFromServer() with the URL of available key servers. This *program evolution* further decays the already flawed design of MailApp as class SecureMailApp may be considered as a second instance of *The Blob* anti-pattern: method encryptMessage() of class SecureMailApp intensively calls method find-Key() in class Contact. This example illustrates a well-known dilemma of agile program development in an object-oriented world: *Class-Responsibility Assignment* decisions may become unbalanced over time, due to unforeseen changes crosscutting the initial program design [31]. As a result, a majority of objectoriented design flaws like *The Blob* anti-pattern is mainly caused by low cohesion/high coupling ratios within/among classes and their members [5,6].

*Refactoring of Object-Oriented Programs.* Object-oriented *refactorings* constitute an emerging and widely used counter-measure against design flaws [13]. Refactorings impose systematic, semantic-preserving program transformations for continuously improving code-quality measures of evolving source code. For instance, the *MoveMethod* refactoring is frequently used to update CRA decisions after program changes, by moving method implementations between classes [34]. Applied to our example, a developer may (manually) conduct two refactorings, **R1** and **R2**, to counteract the aforementioned design flaws:

**(R1)** move method plainToHtml() from class MailApp to class Message, and

**(R2)** move method encryptMessage() from class SecureMailApp to class Contact.

However, concerning programs of realistic size and complexity, tool support for (semi-)automated program refactorings becomes more and more inevitable. The major challenges in finding effective sequences of object-oriented refactoring operations consists in *detecting* flawed program parts to be refactored, as well as in *recommending* program transformations applied to those parts to obtain an improved, yet behaviorally equivalent program design. The complicated nature of the underlying optimization problem stems from several phenomena.


Further research especially on the last phenomenon is required to understand to what extent a refactoring actually alters (in a potentially critical way) the original program. For instance, for refactoring **R2** to yield a correct result, it requires to relax declared *accessibility constraints*: method encryptMessage() has to become public instead of protected after being moved into class Contact to remain accessible for method sendMessage, and, conversely, method getPrivateKey() has to become public instead of private to remain accessible for encryptMessage(). Although these small changes do not affect the functionality of the original program, it may have a negative impact on extra-functional properties like program security. Therefore, the amount of invalid solutions highly depends on the interaction between constraints and repair mechanisms.

*Attack Surface of Object-Oriented Programs.* The *attack surface* of a program comprises all conventional ways of entering a software from outside such that a larger surface increases the danger of exploiting vulnerabilities (either unintentionally by some user, or intentionally by an attacker) [20]. Concerning Java-like programs in particular, explicit restrictions of accessibility of class members provide an essential mechanism to control the attack surface. Hence, refactoring **R2** should be definitely blamed as harmful as the enforced relaxations of accessibility constraints, especially those of the indeed security-critical method getPrivateKey(), unnecessarily widen the attack surface of the original program. In contrast, refactoring **R1** should be appreciated as it even narrows the attack surface by setting method plainToHtml() from public to private.

*Challenges.* As illustrated by our example, the attack surface of a program is a crucial, but yet unexplored, factor when searching for reasonable object-oriented program refactorings. However, if not treated with special care, accessibility constraints may seriously obstruct program maintenance by eagerly suppressing any refactoring opportunity in advance. We therefore pursue a model-based methodology for automating the search for optimal sequences of program refactorings by explicitly taking accessibility constraints into account. We formulate the underlying problem as constrained multi-objective optimization problem (MOOP) incorporating explicit control and minimization of attack-surface metrics. This framework allows us to facilitate search-based model transformation capabilities for approximating optimal solutions.

## **3 Search-Based Program Refactorings with Attack-Surface Control**

We now describe our model-based framework for identifying (presumably) optimal sequences of object-oriented refactoring operations. To explicitly control (and minimize) the impact of recommended refactorings on the attack surface, we extend an existing EMF meta-model for representing Java-like programs with accessibility information and respective constraints. Based on this model, refactoring operations are defined as model-transformation rules which allow us to apply search-based model-transformation techniques to effectively explore candidate solutions of the resulting MOOP.

#### **3.1 Program Model**

In the context of model-based program transformation, a *program model* serves as unified program representation (1) constituting an appropriate level of abstraction comprising only (syntactic) program entities being relevant for a given task, and (2) including additional (static semantic) information required for a given task [24]. Concerning program models for model-based object-oriented program refactorings in particular, the corresponding model-transformation operations are mostly applied at the level of classes and members, whereas more fine-grained source code details can be neglected. Instead, program elements are augmented with additional (static semantic) dependencies to other entities being crucial for refactoring operations to yield correct results [24–26]. Here, we employ and enhance the program model proposed by Peldszus et al. [33] for automatically detecting structural anti-patterns (cf. **O1b**) in Java programs. Their incremental detection process also includes evaluation of coupling and cohesion metrics (cf. **O1a**), and both metric values and the detected anti-patterns are added as additional information into the program model.

**Fig. 2.** Excerpt of the program-model representation of MailApp

Figure 2 shows an excerpt of the program-model representation for MailApp including the classes MailApp, Message, SecureMailApp, and Contact together with a selection of their method definitions. Each program element is represented by a white rectangle labeled with name : type. The available types of program entities and possible (syntactic and semantic) dependencies (represented by arrows) between respective program elements are defined by a *program meta-model*, serving as a template for valid program models [26,37]. The program model comprises as first-class entities the classes (type TClass)

**Fig. 3.** Model-transformation rule for *MoveMethod* refactoring

together with their members as declared in the program. The representation of methods is split into signatures (type TMethodSignature) and definitions (type TMethodDefinition) to capture overloading/overriding dependencies among method declarations (e.g., overriding of method sendMessage() imposes one shared method signature, but two different method definitions). Solid arrows correspond to syntactic dependencies between program elements such as aggregation (unlabeled) and inheritance (label extends) and relations between method signatures and their definitions, whereas dashed arrows represent (static) semantic dependencies (e.g., arrows labeled with call denote caller-callee relations between methods).

*Design-Flaw Information.* The program model further incorporates information gained from *design-flaw detection* [33], to identify program parts to be refactored. In our example, design-flaw annotations (in gray) are attached to affected program elements, namely classes Message and Contact constitute *data classes* and classes MailApp and SecureMailApp constitute *controller classes*, which lead to two instances of the anti-pattern *The Blob*.

*Accessibility Information.* To reason about the impact of refactorings on the attack surface of programs, we extend the program model of Peldszus et al. by accessibility information. Our extensions include the attribute accessibility denoting the *declared accessibility* of entities as shown for method definitions in Fig. 2. In addition, our model comprises package declarations of classes (type TPackage) to reason about package-dependent accessibility constraints.

## **3.2 Model-Based Program Refactorings**

Based on the program-model representation, refactoring operations by means of semantic-preserving program transformations can be concisely formalized in a declarative manner in terms of *model-transformation rules* [26]. A *modeltransformation rule* specifies a generic change pattern consisting of a *left-hand side* pattern to be matched in an input model for applying the rule, and a *right-hand side* replacing the occurrence of the left-hand side to yield an output model. Here, we focus on (sequences of) *MoveMethod* refactorings as it has been shown in recent research that *MoveMethod* refactorings are considerably effective in improving CRA measures in flawed object-oriented program designs [34]. Figure 3 shows a (simplified) rule for *MoveMethod* refactorings defined on our program metamodel, using a compact visual notation superimposing the left- and right-hand side. The rule takes a source class srcClass, a target class trgClass and a method signature methodSig as parameters, *deletes* the containment arrow between source class and signature (red arrow annotated with **--**) and *creates* a new containment arrow from the target class (green arrow annotated with **++**), only if such an arrow not already exists before rule application. The latter *(pre-)condition* is expressed by a *forbidden* (crossed-out) arrow. For a comprehensive list of all necessary pre-conditions (or, *pre-constraints*), we refer to [38].

*Accessibility Post-constraints.* Besides pre-constraints, for refactoring operations to yield correct results, it must satisfy further *post-constraints* to be evaluated after rule application, especially concerning accessibility constraints as declared in the original program (i.e., member accesses like method calls in the original program must be preserved after refactoring [24]). As an example, a (simplified) post-constraint for the *MoveMethod* rule is shown on the right of Fig. 3 using OCL-like notation. Members refers to the collection of all class members in the program. The post-constraint utilizes helper-function reqAcc(m) to compute the *required access modifier* of class member m and checks whether the declared accessibility of m is at least as generous as required (based on the canonical ordering private < default < protected < public) [38].

For instance, if refactoring **R2** is applied to MailApp, method encryptMessage() violates this post-constraint, as the call from sendMessage() from another package requires accessibility public, whereas the declared accessibility is protected. Instead of immediately rejecting refactorings like **R2**, we introduce an *accessibility-repair operation* of the form m.accessibility := reqAcc(m) for each member violating the post-constraint which therefore causes a *relaxation* of the attack surface. However, this repair is not always possible as relaxations may lead to incorrect refactorings altering the original program semantics (e.g., due to method overriding/overloading [38]). In contrast, refactoring **R1** (i.e., moving plainToHtml() to class Message) satisfies the post-constraint as the required accessibility of plainToHtml() becomes private, whereas its declared accessibility is public. In those cases, we may also apply the operation m.accessibility := reqAcc(m), now leading to a *reduction* of the attack surface. Different strategies for attack-surface reduction will be investigated in Sect. 4.

#### **3.3 Optimization Objectives**

We now describe the evaluation of objectives **(O1)–(O3)** on the program model, to serve as fitness values in a search-based setting.

*Coupling/Cohesion.* Concerning **(O1a)**, coupling and cohesion metrics are well-established quality measures for CRA decisions in object-oriented program design [4]. In our program model, *coupling* (**COU**) is related to the overall number of member accesses (e.g., *call*-arrows) across class boundaries [5], and for measuring *cohesion*, we adopt the well-known **LCOM5** metric to quantify *lack* of cohesion among members within classes [17]. While there are other metrics which indicate good CRA decisions, such as **Number of Children**, these metrics are not modifiable using *MoveMethod* refactorings and are therefore not used in this paper [9]. Consequently, good CRA decisions exhibit low values for both **COU** and **LCOM5**. Hence, refactorings **R1** and **R2** both improve values of **COU** (i.e., by eliminating inter-class *call*-arrows) and **LCOM5** (i.e., by moving methods into classes where they are called).

*Anti-patterns.* Concerning **(O1b)**, we limit our considerations to occurrences of *The Blob* anti-pattern for convenience. We employ the detection-approach of Peldszus et al. [33] and consider as objective to minimize the number of *The Blob* instances (denoted **#BLOB**). For instance, for the original MailApp program (white parts in Fig. 1), we have **#BLOB** = 1, while for the extended version (white and gray parts), we have **#BLOB** = 2. Refactoring **R1** may help to remove the first occurrence and **R2** potentially removes the second one.

*Changes.* Concerning **(O2)**, real-life studies show that refactoring recommendations to be accepted by users must avoid a too large deviation from the original design [8]. Here, we consider the *number* of *MoveMethod* refactorings (denoted **#REF**) to be performed in a recommendation, as a further objective to be minimized. For example, solely applying **R1** results in **#REF** = 1, whereas a sequence of **R1** followed by **R2** most likely imposes more design changes (i.e., **#REF** = 2). In contrast, accessibility-repair operations do not affect the value **#REF**, but rather impact objective **(O3)**.

*Attack Surface.* Concerning **(O3)**, the guidelines for secure object-oriented programming encourages developers to grant as least access privileges as possible to any accessible program element to minimize the attack surface [19]. In our program model, the attack-surface metric (denoted **AS**) is measured as

$$\mathbf{AS} = \sum\_{m \in \text{Members}} \omega(m.accessibility),\tag{1}$$

where weighting function <sup>ω</sup> : *Mod* <sup>→</sup> <sup>N</sup><sup>0</sup> on the set *Mod* of accessibility modifiers may be, for instance, defined as ω(private) = 0, ω(default) = 1, ω(protected) = 2, ω(public) = 3. Hence, a lower value corresponds to a smaller attack surface. For example, **R1** enables an attack-surface reduction by setting plainToHtml()from public to private which decreases **AS** by 3. In contrast, **R2** involves a repair step setting encryptMessage()from protected to public which increases **AS** by 1. Whether such negative impacts of refactorings on **(O3)** are outweighed by simultaneous improvements gained for other objectives depends, among others, on the actual weighting ω applied. For instance, each further modifier public considerably opens the attack surface and should therefore be blamed by a higher weighting value, as compared to the other modifiers (cf. Sect. 4).

#### **3.4 Search-Based Optimization Process**

Our tool for recommending optimized object-oriented refactoring sequences, called GOBLIN<sup>2</sup>, is based on a combination of search-based multi-objective

<sup>2</sup> Goblin is supervillain and Head of National *Security* in the Marvel universe [3]. GOB-LIN also means *G*eneric *O*bjective-*B*ased *L*ayout *I* mprovements for *N* on-designs.

optimization techniques using genetic algorithms and model-transformations on the basis of the MOMoT framework [11]. Figure 4 shows an overview on GOB-LIN. First, the input Java program is translated into our program model [33]. This *original program model* together with its objective values for **(O1)**−**(O3)** (i.e., its *fitness* values) serves as a baseline for evaluating the improvements obtained by candidate refactorings. The built-in genetic algorithm (NSGA-III) of MOMoT is initialized by an *initial population* of a fixed number of *individuals* serving as *generation* 0, where each individual constitutes a *sequence* of at least 1 up to a maximum number of *MoveMethod* rule applications (cf. Fig. 3) to the original program model. Thus, each individual corresponds to a refactored version of the original program model on which the resulting fitness values are evaluated. The refactored program model is obtained by applying the given sequence of refactorings to the original program model. Steps within a sequence not being applicable to an intermediate model (e.g., due to unsatisfied pre-conditions) are skipped, whereas steps producing infeasible results (e.g., due to unsatisfied and non-repairable post-conditions) cause the entire individual to become invalid (thus being removed from the population).

**Fig. 4.** Architecture of the GOBLIN tool

For deriving generation i + 1 from generation i, NSGA-III first creates a set of new individuals using random *crossover* and *mutation* operators. As indicated in Fig. 4, a crossover splits and recombines two individuals into a new one, while a mutation generates a new individual by injecting small changes into an existing one. Afterwards, in the *selection* phase, individuals from the overall population (the original and newly created individuals) are selected into the next generation, depending on their *fitness* values. For more details on NSGA-III, we refer to [15,28]. The search-process terminates when a maximum number of generations (or, individuals, respectively) has been reached, resulting in a Pareto-front of non-dominated individuals, each constituting a refactoring recommendation [11].

## **4 Experimental Evaluation**

We now present experimental evaluation results gained from applying GOB-LIN to a collection of Java programs. First, to investigate the impact of *attacksurface reduction* on the resulting refactoring recommendations, we consider the following *reduction strategies*, differing in when to perform attack-surface reduction during search-space exploration (where step means a refactoring step):


We are interested in the impact of each strategy on the trade-off between *attacksurface* metrics and design-quality metrics (i.e., do the recommended refactoring sequences tend to optimize more the attack surface aspect or the program design?). We quantify *attack-surface impact* (**ASI**) and *design impact* (**DI**) of a refactoring recommendation rr as follows:

$$\mathbf{ASI}(rr) = \frac{\mathbf{AS}(rr) - \mathbf{AS}(\text{orig})}{\mathbf{AS}(\text{orig})} \tag{2}$$

$$\mathbf{DI}(rr) = \frac{\mathbf{COU}(rr) - \mathbf{COU}(\text{orig})}{\mathbf{COU}(\text{orig})} + \frac{\mathbf{LCOM5}(rr) - \mathbf{LCOM5}(\text{orig})}{\mathbf{LCOM5}(\text{orig})} \tag{3}$$

where orig refers to the original program. Second, we consider the impact of different weightings ω on attack-surface metric **AS**. As modifier public has a considerably negative influence on the attack surface, we study the impact of increasing the penalty for public in ω, as compared to the other modifiers. We are interested especially in whether there exists a threshold for which any designimproving refactoring would be rejected as security-critical. Finally, we compare GOBLIN to the recent refactoring tools JDeodorant and CODe-Imp, which both do not explicitly consider attack-surface metrics as optimization objective so far. To summarize, we aim to answer the following research questions:


#### **4.1 Experiment Setup and Results**

We conducted our experiments on an established corpus of real-life open-source Java programs of various size [33,39] as listed in Table 1 (with lines of code LOC, number of packages #P, number of classes #C and number of methods #M). For a compact presentation, we divide the corpus into three programsize categories (*small, mid-sized, large*), indicated by horizontal lines in Table 1. All experiments have been executed on a Windows-Server-2016 machine with a 2.4 GHz quad-core CPU, 32 GB RAM and JRE 1.8. We used the default geneticalgorithm configuration of MOMoT in all our experiments [11]: termination after 10,000 individual evaluations, population size of 100, and each individual consisting of at most 10 refactorings. We applied the metrics for **(O1)**−**(O3)** (cf. Sect. 3.3) to compute fitness values. GOBLIN requires 25 min to compute a set of refactoring recommendations for the smallest program, up to several hours in the case of large programs, which is acceptable for a search-based (off-line) optimization approach. We selected a representative set of computed recommendations which were manually checked for program correctness and impact.

For **(RQ1)**, we measured **ASI** and **DI** values for two runs of GOBLIN (cf. Figs. 6a, b, c, d, e and f). Figures 6a and b (first row, side by side) show a boxplot for each Strategy (1−3) for *small* programs of our corpus (#iSj referring to the program number i in Table 1 and Strategy j). The box-plots show the distribution of **ASI** (Fig. 6a) and **DI** (Fig. 6b) values for each refactoring recommendation of GOBLIN. The figure-pairs 6c−6d and 6e−6f show the same data for *mid-sized* and *large programs*, respectively. For **(RQ2)**, we used Strategy 3 from **(RQ2)** and varied function ω to study different penalties for modifier public. Figure 5 plots the (minimal) values of **ASI** and **DI** depending on ω(public) (from 3 up to 100). Regarding **(RQ3)**, we compare the results of GOBLIN to those of state-of-the-art refactoring recommender tools, JDeodorant [12] and CODe-Imp [27]. Refactorings proposed by JDeodorant have as singleton optimization objective to eliminate specific anti-patterns through heuristic refactoring strategies. In particular, JDeodorant employs *ExtractClass* [13] to eliminate *The Blob* (also called *GodClass*), by separating parts from the controllerclass into a freshly created class. Thus, each recommendation of JDeodorant subsumes multiple *MoveMethod* refactorings (into the fresh target class). In contrast, CODe-Imp pursues a search-based approach, including a variety of


3 7 10 20 50 70 100 0 *−*0.001 *−*0.002 *−*0.003 *−*0.004 ω(public) Minimal Impact min(**ASI**) min(**DI**)

**Table 1.** Evaluation corpus

**Fig. 5.** Minimal **ASI** and **DI** values for different weightings of public

**Fig. 6.** Measurement results

refactoring operations and design-quality metrics. For a comparison to GOB-LIN, we used the *MoveMethod* refactoring of CODe-Imp which produces one sequence of *MoveMethod* refactorings per run. Figures 6g and h contain comparisons of **ASI** and **DI** values, respectively, for our corpus (excluding QuickUML due to relatively very high variations). For each program, the upper box-plot shows the results for GOBLIN and the lower one for JDeodorant, respectively. CODe-Imp only successfully produced results for QuickUML and JUnit (10 runs each) while terminating without any result for the others.

#### **4.2 Discussion**

Concerning **(RQ1)**, Strategy 3 leads to the best attack-surface impact for *small* programs (under neglectible execution-time overhead), while even slightly improving the design impact. Although this clear advantage dissolves for *midsized* and *large* programs, it still contributes to a reasonable trade-off, while attack-surface reductions tend to hamper design improvements as expected. Calculating the Pearson correlation [32] between **ASI** and **DI** shows that (1) the strategy does not influence the correlation and (2) for *small* programs, GOB-LIN finds refactorings which are beneficial for both attack surface and program design.

Concerning **(RQ2)**, Fig. 5 shows that a higher value for ω(public) leads to a better attack-surface impact, as attack-surface-critical refactorings are less likely to survive throughout generations. The increase in **ASI** is remarkably steep from ω(public) = 3 to ω(public) = 7, but exhibits slow linear growth for higher values. Regarding the design impact, up to ω(public) = 10, the best achieved **DI** also grows linearly, but afterwards, no more **DI** improvements emerge. In higher value ranges (>70), **DI** reaches a threshold, and degrades afterwards.

Regarding **(RQ3)**, the *The Blob* elimination strategy of JDeodorant necessarily increases attack surfaces, as calls to extracted methods have to access the new class, thus necessarily increasing accessibility at least up to default. As also shown in Fig. 6g, there are almost no refactorings proposed by JDeodorant with a positive attack-surface impact. Surprisingly, JDeodorant also achieves a less beneficial design impact than GOBLIN, with a strong correlation between **ASI** and **DI**. Our unfortunately very limited set of observations for CODe-Imp shows that, due to the similar search technique, the refactorings found by CODe-Imp and GOBLIN are quite similar. Nevertheless, due to the different focus of objectives, CODe-Imp tends to increase attack surfaces. Although, the differences in metrics definitions forbid any definite conclusions, however, CODe-Imp does not achieve any design improvements according to our metrics.

To summarize, our experimental results demonstrate that attack-surface impacts of refactorings clearly deserve more attention in the context of refactoring recommendations, revealing a practically relevant trade-off (or, even contradiction) between traditional design-improvement efforts and extra-functional (particularly, security) aspects. Our experiments further uncover that existing tools are mostly unaware of attack-surface impacts of recommended refactorings.

## **5 Related Work**

*Automating Design-Flaw Detection and Refactorings.* Marinescu proposes a metric-based design-flaw detection approach similar to Peldszus et al. in [33], which is used in our work. However, both works do not deal with elimination of detected flaws [21]. In contrast, the DECOR framework also includes recommendations for eliminating anti-patterns, whereas, in contrast to our work, those recommendations remain rather atomic and local. More related to our approach, Fokaefs et al. [12] and Tsantalis et al. [40] consider (semi-)automatic refactorings to eliminate anti-patterns like *The Blob* in the tool JDeodorant. Nevertheless, they focus on optimizing one single objective and do not consider multiple, esp. extra-functional, aspects like security metrics as in our approach.

*Multi-objective Search-Based Refactorings.* O'Keeffe and O Cinn´ ´ eide use search-based refactorings in their tool CODe-Imp [28] including various standard refactoring operations and different quality metrics as objectives [27]. Seng et al. consider a search-based setting, where, similar to our approach, compound refactoring recommendations comprise atomic *MoveMethod* operations. Harman and Tratt also investigate a Pareto-front of refactoring recommendations including various design objectives [16], and more recently, Ouni et al. conducted a large-scale real-world study on multi-objective search-based refactoring recommendations [30]. However, neither of the approaches investigates the impact of refactorings on security-relevant metrics as in our approach.

*Security-Aware Refactorings.* Steimann and Thies were the first to propose a comprehensive set of accessibility constraints for refactorings covering full Java [38]. Although their constraints are formally founded, they do not consider software metrics to quantify the attack surface impact of (sequences of) refactorings. Alshammari et al. propose an extensive catalogue of software metrics for evaluating the impact of refactorings on program security of objectoriented programs [1]. Similarly, Maruyama and Omori propose a technique [22] and tool [23] for checking if a refactoring operation raises security issues. However, all these approaches are concerned with security and accessibility constraints of specific refactorings, but they do not investigate those aspects in a multi-objective program optimization setting. The problem of measuring attack surfaces serving as a metric for evaluating secure object-oriented programming policies has been investigated by Zoller and Schmolitzky [41] and Manadhata and Wing [20], respectively. Nevertheless, those and similar metrics have not yet been utilized as optimization objective for program refactoring. Finally, Ghaith and O Cinn´ ´ eide consider a catalogue of security-relevant metrics to recommend refactorings using CODe-Imp, but they also consider security as single objective [14].

## **6 Conclusion**

We presented a search-based approach to recommend sequences of refactorings for object-oriented Java-like programs by taking the attack surface as additional optimization objective into account. Our model-based methodology, implemented in the tool GOBLIN, utilizes the MOMoT framework including the genetic algorithm NSGA-III for search-space exploration. Our experimental results gained from applying GOBLIN to real-world Java programs provides us with detailed insights into the impact of attack-surface metrics on fitness values of refactorings and the resulting trade-off with competing design-quality objectives. As a future work, we plan to incorporate additional domain knowledge about critical code parts to further control security-aware refactorings.

**Acknowledgements.** This work was partially funded by the Hessian LOEWE initiative within the Software-Factory 4.0 project as well as by the German Research Foundation (DFG) in the Priority Programme SPP 1593: Design For Future - Managed Software Evolution (LO 2198/2-1, JU 2734/2-1).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Effective Analysis of Attack Trees: A Model-Driven Approach**

Rajesh Kumar1(B) , Stefano Schivo<sup>1</sup> , Enno Ruijters<sup>1</sup> , Buˇgra Mehmet Yildiz<sup>1</sup>, David Huistra<sup>1</sup> , Jacco Brandt<sup>1</sup>, Arend Rensink<sup>1</sup> , and Mari¨elle Stoelinga1,2

<sup>1</sup> Formal Methods and Tools, University of Twente, Enschede, The Netherlands *{*r.kumar,s.schivo,e.j.j.ruijters,b.m.yildiz,d.j.huistra, a.rensink,m.i.a.stoelinga*}*@utwente.nl, j.h.brandt@student.utwente.nl <sup>2</sup> Department of Software Science, Radboud University, Nijmegen, The Netherlands

**Abstract.** Attack trees (ATs) are a popular formalism for security analysis, and numerous variations and tools have been developed around them. These were mostly developed independently, and offer little interoperability or ability to combine various AT features.

We present ATTop, a software bridging tool that enables automated analysis of ATs using a model-driven engineering approach. ATTop fulfills two purposes: 1. It facilitates interoperation between several AT analysis methodologies and resulting tools (e.g., ATE, ATCalc, ADTool 2.0), 2. it can perform a comprehensive analysis of attack trees by translating them into timed automata and analyzing them using the popular model checker Uppaal, and translating the analysis results back to the original ATs. Technically, our approach uses various metamodels to provide a unified description of AT variants. Based on these metamodels, we perform model transformations that allow to apply various analysis methods to an AT and trace the results back to the AT domain. We illustrate our approach on the basis of a case study from the AT literature.

## **1 Introduction**

Formal methods are often employed to support software engineers in particularly complex tasks: model-based testing, type checking and extended static checking are typical examples that help in developing better software faster. This paper is about the reverse direction: showing how software engineering can assist formal methods in developing complex analysis tools.

More specifically, we reap the benefits of model-driven engineering (MDE) to design and build a tool for analyzing attack trees (ATs). ATs [25,31] are a popular formalism for security analysis, allowing convenient modeling and analysis of complex attack scenarios. ATs have become part of various system engineering frameworks, such as UMLsec [16] and SysMLsec [27].

Attack trees come in a large number of variations, employing different security attributes (e.g., attack time, costs, resources, etc.) as well as modeling constructs (e.g., sequential vs. parallel execution of scenarios). Each of these variations comes with its own tooling; examples include ADTool [12], ATCalc [2], and Attack Tree Evaluator [5]. This "jungle of attack trees" seriously hampers the applicability of ATs, since it is impossible or very difficult to combine different features and tooling. This paper addresses these challenges and presents ATTop<sup>1</sup>, a software tool that overarches existing tooling in the AT domain.

In particular, the main features of ATTop are (see Fig. 1):


To do so, we again refer to the concepts of MDE and deploy *model transformations*. We deploy two categories here: so-called *horizontal* transformations achieve interoperability between existing tools. *Vertical* transformations interpret a model via a set of semantic rules to produce a mathematical model to be analyzed with formal methods.

3. *Bringing the results back to the original domain*. When a mathematical model is analyzed, the analysis result is computed in terms of the mathematical model, and not in terms of the original AT. For example, if AT analysis is done via model checking, a trace in the underlying model (i.e., transition system) can be produced to show that, say, the cheapest attack costs \$100. What security practitioners need, however, is a path or attack vector in the original AT. This interpretation in terms of the original model is achieved by a vertical model transformation in the inverse direction, from the results as obtained in the analysis model back into the AT domain.

These features make ATTop a *software bridging tool*, acting as a bridge between existing AT languages, and between ATs and formal languages.

**Our Contributions.** The contributions of this paper include:


**Overview of Our Approach.** Figure 1 depicts the general workflow of our approach. It shows how ATTop acts as a bridge between different languages and

<sup>1</sup> Available at https://github.com/utwente-fmt/attop.

formalisms. In particular, thanks to horizontal transformations, ATTop makes it possible to use ATs described in different formats, both as an input to other tools and as an input to ATTop itself. In the latter case, vertical transformations are used in order to deal with Uppaal as a back-end tool without exposing ATTop's users to the formal language of timed automata.

**Fig. 1.** Overview of our approach, showing the contributions of the paper in the gray rectangle. Here ATE, ATCalc, ADTool 2.0 are different attack tree analysis tools, each with its own input format. ATTop allows these tools to be interoperable (horizontal model transformations, see Sect. 4.1). ATTop also provides a much more comprehensive AT analysis by automatic translation of attack trees into timed automata and using Uppaal as the back-end analysis tool (vertical transformations, see Sect. 4.2).

**Related Work.** A large number of AT analysis frameworks have been developed, based on lattice theory [18], timed automata [11,21,23], I/O-IMCs [3,22], Bayesian networks [13], Petri nets [8], stochastic games [4,15], etc. We refer to [20] for an overview of AT formalisms. Surprisingly, little effort has been made to provide a security practitioner with a generic tool that integrates the benefits of all these analysis tools.

The use of model transformations with Uppaal was explored in [29] for a range of different formalisms; the Uppaal metamodel that was presented there is the one we use in ATTop. A related approach for fault trees was proposed in [28]. In [14], the authors manually translate UML sequence diagrams into timed automata models to analyze timeliness properties of embedded systems. In [1], the OpenMADS tool is proposed that takes the input of SysML diagrams and UML/MARTE annotations and automatically translates these into deterministic and stochastic Petri nets (DSPNs); however, no model-driven engineering technique was applied.

**Organization of the Paper.** In Sect. 2, we describe the background. Section 3 presents the metamodels we use in ATTop, while the model transformations are described in Sect. 4. Section 5 describes the features of ATTop, and in Sect. 6 we show the results of our case study using ATTop. Finally, we conclude the paper in Sect. 7.

## **2 Background**

#### **2.1 Attack Trees in the Security Domain**

Modern enterprises are ever growing complex socio-technical systems comprised of multiple actors, physical infrastructures, and IT systems. Adversaries can take advantage of this complexity, by exploiting multiple security vulnerabilities simultaneously. Risk managers, therefore, need to predict possible attack vectors, in order to combat them. For this purpose, attack trees are a widely-used formalism to identify, model, and quantify complex attack scenarios.

Attack trees (ATs) were popularized by Schneier through his seminal paper in [31] and were later formalized by Mauw in [25]. ATs show how different attack steps combine into a multi-stage attack scenario leading to a security breach. Due to the intuitive representation of attack scenarios, this formalism has been used in both academia and industry to model practical case studies such as ATMs [10], SCADA communication systems [7], etc. Furthermore, the attack tree formalism has also been advocated in the Security Quality Requirements Engineering (SQUARE) [26] methodology for security requirements.

*Example 1.* Figure 2 shows an example AT (adapted from [36]) modeling the compromise of an Internet of Things (IoT) device.

At the top of the tree is the event compromise IoT device, which is refined using *gates* until we reach the atomic steps where no further refinement is desired (the leaves of the tree). The top gate in Fig. 2 is a SAND (*sequential AND*)-gate denoting that, in order for the attack to be successful, the children of this gate must be executed sequentially from left to right. In the example, the attacker first needs to successfully perform access home network, then exploit software vulnerability in IoT device, and then run malicious script. The AND-gate at access home network represents that both gain access to private networks and get credentials must be performed, but these can be performed in any order, possibly in parallel. Similarly, the OR gate at gain access to private networks denotes that its children access LAN and access WLAN can be attempted in parallel, but only one needs to succeed for a successful attack.

Traditionally, each leaf of an attack tree is decorated with a single attribute, e.g., the probability of successfully executing the step, or the cost incurred when taking this step. The attributes are then combined in the analysis to obtain metrics, such as the probability or required cost of a successful attack [19].

Over the years, the AT formalism has been enriched both structurally (e.g., adding more logical gates, countermeasures, ordering relationships; see [20] for

**Fig. 2.** Attack tree modeling the compromise of an IoT device. Leaves are equipped with the cost and time required to execute the corresponding step. The parts of the tree attacked in the cheapest successful attack are indicated by a darker color, with start and end times for the steps in this cheapest attack denoted in red (times correspond to the scenario in Fig. 11). (Color figure online)

an overview) and analytically (e.g., multi-attribute analysis, time- and costoptimal analysis). This has resulted in a large number of tools (ADTool 2.0 [12], ATCalc [5], ATE [2], etc.), each with their own analysis technique.

Such a wide range of tools can be useful for a security practitioner to perform different kinds of analyses of attack trees. However, this requires preparing the AT for each tool, as each one has its own input format. To overcome the difficulty of orchestrating all these different tools, we propose one tool—ATTop—to allow specification of ATs combining features of multiple formalisms and to support analysis of such ATs by different tools without duplicating it for each tool.

#### **2.2 Model-Driven Engineering**

*Model-driven engineering* (MDE) is a software engineering methodology that treats models not only as documentation, but also as first-class citizens, to be directly used in the engineering processes [32]. In MDE, a *metamodel* (also referred to as a *domain-specific language*, DSL) is specified as a model at a more abstract level to serve as a language for models [33]. A metamodel captures the concepts of a particular domain with the permitted structure and behavior, to which models must adhere. Typically, metamodels are specified in class diagram-like structures.

MDE provides interoperability between domains (and tools and technologies in these domains) via *model transformations*. The concept of model transformation is shown in Fig. 3. Model transformations map the elements of a source

**Fig. 3.** The concept of *model transformation*

metamodel to the elements of a target metamodel. This mapping is described as a transformation definition, using a language specifically designed for this purpose. The transformation engine executes the transformation definition on the input model and generates an output model.

Adaptation of MDE provides various benefits [30,34,37], specifically:


There are a number of tools available for realizing MDE. In this paper, we have used the Eclipse Modeling Framework (EMF) [35], which is a state-of-theart tool developed to this aim. EMF provides the *Ecore* format for defining the metamodels and has many plug-ins to support the various functionalities related to MDE. The model transformations we present in this paper were implemented using the Epsilon Transformation Language (ETL) [17], which is one of the domain-specific languages provided by the Epsilon framework. We have chosen ETL since it is an easy-to-use language and allows users to inherit, import and reuse other Epsilon modules, which increases reusability. We use Java to select and execute the ETL transformations.

#### **3 Metamodels for Attack Tree Analysis**

ATTop uses three different metamodels to represent the attack tree domain concepts, all defined in the Ecore format. These are shown in Figs. 4, 5 and 6, in a notation similar to that of UML class diagrams. They show the domain classes and edges representing associations between classes. Edges denote references (→), containment ( ), or supertype ( ) relations. Multiplicities are denoted between square brackets (e.g., [0..\*] for unrestricted multiplicity).

	- The ATMM represents the core, generic concepts of ATs, resulting in a minimal (and thus clean) metamodel that a domain expert can easily read, understand and use to create models.
	- The ATMM provides a lot of flexibility in specifying the relevant concepts by using string names and generic values. Concepts such as the Connector and the Edge are specified as abstract entities with a set of concrete instances. Therefore, new connectors and edges can easily be added to the metamodel without breaking existing model instances. The metamodel is designed to have good support for model operations, such as traversal of the AT models. From a node, any other node can be reached directly or indirectly following references.
	- The ATMM node and tree attributes offer convenient and generic methods for supporting the results of analysis tools. This allows us to translate results from a formal tool back into the AT domain and associate them to the original AT model (see Sect. 4.4).

Below we discuss these metamodels in more detail.

**1. AT Metamodel (ATMM).** The ATMM metamodel is a combination of two separate metamodels, one representing the attack tree structure (*Structure metamodel*, Fig. 4 left) and the other representing the attack tree attributes (*Values metamodel*, Fig. 4 right). This separation allows us to consider different attack scenarios modeled via the same attack tree, but decorated with different attributes. For example, it is easy to define attribute values based on the attacker type: script kiddie, malicious insider, etc. may be all be interested in the same asset, but each of them possesses different access privileges and is equipped with different resources.

*Structure Metamodel.* The structure model, depicted in Fig. 4 on the left, represents the structure of the attack tree. Its main class AttackTree contains a set of one or more Nodes, as indicated by the containment arrow between AttackTree and Node. One of these nodes is designated as the root of the tree, denoted by the root reference. Each Node is equipped with an id, used as a reference during transformation processes. Furthermore, each node has a (possibly empty) list of its parents and children, which allows to easily traverse the AT. A node may have a connector, i.e., a *gate* such as AND, OR, SAND (sequential-AND), etc.

**Fig. 4.** The ATMM metamodel separated into the structure and values metamodels. Some connectors, types, and purposes are omitted for clarity and denoted by ellipses.

In addition to the structure specified by the metamodel, some constraints can be used to ensure that a model is a valid attack tree. For example, the tree cannot contain cycles, the nodes must form a connected graph, etc. These constraints are separately formulated in the Epsilon Validation Language (EVL [17]). An example of such a constraint is shown in Listing 1.

*Values Metamodel.* The Values metamodel (Fig. 4, right side) describes how values are attributed to nodes (arrow from Attribute on the right to Node on the left). Each Attribute contains exactly one Value, which can be of various (basic or complex) types: For example, RealValue is a type of Value that contains real (Double) numbers. A Domain groups all those attributes that have the same Purpose. By separating the purpose of attributes from their data type, we can use basic data types (integer, boolean, real number) for different purposes: For example, a real number (RealType) can be used in a Domain named "Maximum Duration", where the purpose is a TimePurpose with timeType = MAXIMAL. A RealType number could also be used in a different Domain, say "Likelihood of attack" with the purpose to represent a probability (ProbabilityPurpose, not shown in the diagram). Thanks to the flexibility of this construct, the set of available domains is easily extensible.

```
1 context ATMM!AttackTree {
2 constraint OneAndOnlyOneChildWithoutParents {
3 check : ATMM!Node.allInstances.select(n|n.parents.size() == 0).size() = 1
4 and self.root = ATMM!Node.allInstances.select(n|n.parents.size() == 0).first()
5 }
6 }
```
**Listing 1.** Constraint specifying that the root node is the only node in an ATMM AT with no parents.

**2. Query Metamodel.** Existing attack tree analysis tools such as ATE, ATCalc, ADTool 2.0, etc. support only a limited set of queries, lacking the flexibility to customize one's own security queries. Using the MDE approach, we have developed the Query metamodel shown in Fig. 5. This allows a security practitioner to ask a wide range of qualitative and quantitative metrics over a wide range of attributes such as cost, time, damage, etc.

Using this metamodel in ATTop, a security practitioner can ask all the security queries available in the aforementioned tools. Furthermore, the metamodel offers a more comprehensive set of security queries where users can tailor their own security queries. For example, it is possible to ask whether a successful attack can be carried out within 10 days and without spending more than \$900.

**Fig. 5.** The query metamodel. The types 'Domain' and 'Value' refer to the classes of the ATMM metamodel (Fig. 4).

The main component of the query metamodel is the element named Query. A query can be one of the following:


Furthermore, a query can be framed by combining one of the above query types with a set of Constraints over the AT attributes. A Constraint is made of a RelationalOperator, a Value and its Domain. For example, the constraint "within 10 days" is expressed with the SMALLER RelationalOperator, a Value of 10, and the Domain of "Maximum Duration".

**3. Scenario Metamodel.** ATTop is geared to provide different results: some of which are numeric, like the probability to execute attack, the maximum cost to execute an attack, etc. Other results contain qualitative information such as an attack vector, which is a partially ordered set of basic attack steps resulting in the compromise of an asset under a given set of constraints (for example, incurring minimum cost). In order to properly trace back the qualitative output to the original attack tree, we use the Scenario metamodel (see Fig. 6).

The Scenario metamodel is used to represent attack vectors. In our context, we consider an attack vector to be a Schedule where there is only one Executor, which we name "Attacker". The sequence of Tasks appearing in a Scenario are then interpreted as the sequence of the attack steps the Attacker needs to carry out in order to reach their objective. Each attack step is actually a node of the original AT, and is represented as an Executable whose name corresponds to the id of the original Node. Timing information contained in each Task describes the start (startTime) and end (endTime) time points for each attack step. Note that an attack can start but not end before the objective is reached (multiplicity "1" for startTime and "0..1" for endTime).

**Fig. 6.** The Scenario metamodel from [29]. In the context of ATs, all instances of this metamodel will have only one Executor, the Attacker; Executables represent attack steps (i.e. Nodes from the AT), while a Scenario is known as an attack vector.

#### **4 Model Transformations**

ATTop supports *horizontal* and *vertical* model transformations. Figure 7 illustrates the difference between these. *Horizontal transformations* convert one model into another that conforms to the same metamodel, e.g., a transformation from one AT analysis tool to another (where the models of both tools are represented in the ATMM metamodel). *Vertical transformations* transform a model into another that conforms to a different metamodel, e.g., the transformation from an AT into a timed automaton. A key feature of ATTop is that it also provides vertical transformations in the reverse direction: analysis results (e.g., traces produced by Uppaal) are interpreted in terms of the original attack tree model.

#### **4.1 Horizontal Transformations: Unifying Dialects of Attack Trees**

One of the goals of applying the model-driven approach is to facilitate interoperation between different tools. To this end, we provide transformations to and from the file formats of ADTool 2.0 [12], Attack Tree Evaluator (ATE) [5], and ATCalc [2].

Due to the different features supported by the various tools, not all input formalisms can be converted to any other format preserving all semantics. For example, ATCalc performs only timing analysis, while ADTool can also perform cost analysis of untimed attack trees. In such cases, the transformations convert whatever information is supported by their output format, omitting unsupported features. As the ATMM metamodel unifies the features of all the listed tools, transformations into this metamodel are lossless.

*Example 2. ATE Transformation.* The Attack Tree Evaluator [5] tool can only process binary trees. Using a simple transformation, we can transform any instance of the ATMM into a binary tree. A simplified version of this transformation, written in ETL, is given in Listing 2. This transformation is based on a recursive method that traverses the tree. For every node with more than two children, it nests all but the first child under a new node until no more than two children remain.

#### **4.2 Vertical Transformations: Analyzing ATs via Timed Automata**

Thus far we have described the transformations to and from dedicated tools for attack trees. In this section we introduce a vertical transformation which we use in ATTop to translate attack trees into the more general-purpose formalism of timed automata (TA). Specifically, we provide model transformations to TAs that can be analyzed by the Uppaal tool to obtain the wide range of qualitative and quantitative properties supported by the query metamodel.

Our transformation targets the Uppaal metamodel described in [29]. It transforms each element of the attack tree (i.e., each gate and basic attack step)

**Fig. 7.** Examples of *horizontal* and *vertical* model transformations.

```
1 var structure := AttackTree.all. first ();
2 structure.Root.NodeToBinary();
3
4 operation Node NodeToBinary(){
5 if (self.Children. size ()>2){
6 var newNode = new Node();
7 newNode.Parents.add(self);
8 structure.Nodes.add(newNode);
9
10 var replaceNodes := self.Children.excluding(self.Children. first ());
11 newNode.Children := replaceNodes;
12 self.Children.removeAll(replaceNodes);
13 self.Children.add(newNode);
14 }
15 for(child in self.Children)
16 child .NodeToBinary();
17 }
```
**Listing 2.** Transformation of an ATMM attack tree to a binary AT

into a timed automaton. These automata communicate via signals and together describe the behavior of the entire tree. For example, Fig. 8 shows the timed automaton obtained by transforming an attack step with a deterministic time to execute of 5 units.

Depending on the features of the model and the desired property to be analyzed, the output of the transformation can be analyzed by different extensions of Uppaal. For example, Uppaal CORA supports the analysis of cost-optimal queries, such as "What

**Fig. 8.** Example of a timed automaton modeling a basic attack step with a fixed time to execute of 5 units.

is the lowest cost an attacker needs to incur in order to complete an attack", while Uppaal-SMC supports statistical model checking, allowing the analysis of models with stochastic times and probabilistic attack steps with queries such as "What is the probability that an attacker successfully completes an attack within one hour". The advantages of Uppaal CORA's exact results come at the cost of state space explosion, which limits the applicability of this approach for larger problems. On the other hand, the speed and scalability of the simulation-based Uppaal-SMC are countered by approximated results and the unavailability of (counter-)example traces.

#### **4.3 Query Transformation: From Domain-Specific to Tool-Specific**

ATTop aims to enable the analysis of ATs also by users that are less familiar with the underlying tools. One challenge for such a user is that every tool has its own method to specify what property of the AT should be computed.

Section 3 describes our metamodel for expressing a wide range of possible queries, and we now transform such queries to a tool-specific format. Many tools support only a single query (e.g., ATE [5] only supports Pareto curves of cost vs. probability), in which case no transformation is performed but ATTop only allows that single query as input.

The Uppaal tool is an example of a tool supporting many different queries. After transforming the AT to a timed automaton (cf. Sect. 4.2), we transform the query into the textual formula supported by Uppaal. The basic form of this formula is determined by the query type (e.g., a ReachabilityQuery will be translated as "E<> toplevel.completed", which asks for the existence of a trace that reaches the top level event), while constraints add additional terms limiting the permitted behavior of the model. By using an Uppaal-specific metamodel for its query language linked to the TA metamodel, our transformation can easily refer to the TA elements that correspond to converted AT elements.

#### **4.4 Result Transformation: From Tool-Specific to Domain-Specific**

Analyses done with a back-end tool produce results that may only be immediately understandable to an expert in that tool. An important feature of ATTop to ease its use by non-experts, is that it provides interpretations of these results in terms of the original AT.

For example, given an attack tree whose leaves are annotated with (timedependent) costs, Uppaal can produce a trace showing the cheapest way to reach a security breach (optionally within a specified time bound). This trace is given in a textual format, with many details that are irrelevant to a security analyst. It is much easier to understand this scenario when shown in terms of the attack tree (for example, Fig. 11 is a scenario described by several pages of Uppaal output). This is exactly the purpose of having reverse transformations: Uppaal's textual traces are automatically parsed by ATTop, generating instances of the Trace metamodel described in [29]. To do so, the transformation from ATMM to Uppaal retains enough information to trace identifiers in the Uppaal model back to the elements of the AT. When parsing the trace, ATTop extracts only the relevant events (e.g., the starts and ends of attack steps) and related information (e.g., time). This information is then stored as an instance of the Scenario metamodel described in Sect. 3.

In the generated Schedule, attack steps are represented as Executables, while Tasks indicate the start and finish time of each attack step, thus describing the attack vector. Only one Executor is present in any attack vector produced by this transformation, and that is the Attacker. An example of such a generated schedule can be seen in Fig. 11.

## **5 Tool Support**

We have developed the tool ATTop to enable users to easily use the transformations described in this paper, without requiring knowledge of the underlying techniques or formalisms. ATTop automatically selects which transformations to apply based on the available inputs and desired outputs. For example, if the user provides an ADTool input and requests an Uppaal output, ATTop will automatically first execute the transformation from ADTool to the ATMM, and then the transformation from ATMM to Uppaal.

Users operate the tool by specifying input files and their corresponding languages, and the desired output files and languages. ATTop then performs a search for the shortest sequence of transformations achieving the desired outputs from the inputs. For example, Fig. 9 shown the tool's main screen, where the user has provided an input AT in Galileo format. The


**Fig. 9.** Screenshot of ATTop's main screen, allowing input file selection, query specification, and output selection.

user can now choose between different queries and analysis engines.

#### **6 Case Study**

As a case study we use the example annotated attack tree given in Fig. 2. We apply ATTop to automatically compute several qualitative and quantitative security metrics. Specifically, we apply a horizontal transformation to convert the model from the ATCalc format to that accepted by ADTool 2.0, and a vertical transformation to analyze the model using Uppaal.

We specify the AT in the Galileo format as accepted by ATCalc. Analysis with ATCalc yields a graph of the probability of

**Fig. 10.** ATCalc plot showing probability of successful attack over time

a successful attack over time, as shown in Fig. 10. Next, we would like to determine the minimal cost of a successful attack, which ATCalc cannot provide. Therefore, we use ATTop to transform the AT to the ADTool 2.0 format, and use ADTool 2.0 to compute the minimal cost (yielding \$270).

Next, we perform a more comprehensive timing analysis using the vertical transformation described in Sect. 4.2. We use ATTop to transform the AT to a timed automaton that can be analyzed using the Uppaal tool. We also transform a query (OptimalityQuery asking for minimal time) to the corresponding Uppaal query. Combining these, we obtain a trace for the fastest successful attack, which ATTop transforms into a scenario in terms of the AT as described in Sect. 4.3.

**Fig. 11.** Scenario of fastest attack as computed by Uppaal . The executed steps and their start–end times are also shown in Fig. 2.

The resulting scenario is shown in Fig. 11. Running the whole process, including the transformations and the analysis with Uppaal, took 6.5 s on an Intel<sup>R</sup> CoreTM i7 CPU 860 at 2.80 GHz running Ubuntu 16.04 LTS.

#### **7 Conclusions**

We have presented a model-driven approach to the analysis of attack trees and a software bridging tool—ATTop—implementing this approach. We support interoperability between different existing analysis tools, as well as our own analysis using the popular tool Uppaal as a back-end engine.

Formal methods have the advantage of being precise, unambiguous and systematic. A lot of effort is spent on their correctness proofs. However, these benefits are only reaped if the tools supporting formal analysis are also correct. To the best of our knowledge, this work is among the first to apply the systematic approach of MDE to the development of formal analysis tools.

Through model-driven engineering, we have developed the attack tree metamodel (ATMM) with support for the many extended formalisms of attack trees, integrating most of the features of such extensions. This unified metamodel provides a common representation of attack trees, allowing easy transformations from and to the specific representations of individual tools such as ATCalc [2] and ADTool [12]. The metamodels for queries and schedules facilitate a userfriendly interface, obtaining relevant questions and presenting results without needing expert knowledge of the underlying analysis tool.

We have presented our approach specifically for attack trees, but we believe it can be equally fruitful for different formalisms and tools as well (e.g. PRISM [24], STORM [9]) by using different metamodels and model transformations. We thus expect our approach to be useful in the development of other tools that bridge specialized domains and formal methods.

**Acknowledgments.** This research was partially funded by STW and ProRail under the project ArRangeer (grant 12238), STW, TNO-ESI, Oc´e and PANalytical under the project SUMBAT (13859), STW project SEQUOIA (15474), NWO projects BEAT (612001303) and SamSam (628.005.015), and EU project SUCCESS (102112).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Distributed Program and System Analysis

# **ROLA: A New Distributed Transaction Protocol and Its Formal Analysis**

Si Liu1(B) , Peter Csaba Olveczky ¨ <sup>2</sup> , Keshav Santhanam<sup>1</sup> , Qi Wang<sup>1</sup> , Indranil Gupta<sup>1</sup> , and Jos´e Meseguer<sup>1</sup>

> <sup>1</sup> University of Illinois, Urbana-Champaign, USA siliu3@illinois.edu <sup>2</sup> University of Oslo, Oslo, Norway

**Abstract.** Designers of distributed database systems face the choice between stronger consistency guarantees and better performance. A number of applications only require *read atomicity* (RA) and *prevention of lost updates* (PLU). Existing distributed database systems that meet these requirements also provide additional stronger consistency guarantees (such as *causal consistency*), and therefore incur lower performance. In this paper we define a new distributed transaction protocol, ROLA, that targets applications where only RA and PLU are needed. We formally model ROLA in Maude. We then perform model checking to analyze both the correctness and the performance of ROLA. For *correctness*, we use standard model checking to analyze ROLA's satisfaction of RA and PLU. To analyze *performance* we: (a) use statistical model checking to analyze key performance properties; and (b) compare these performance results with those obtained by analyzing in Maude the well-known protocol Walter. Our results show that ROLA outperforms Walter.

## **1 Introduction**

Distributed transaction protocols are complex distributed systems whose design is quite challenging because: (i) validating correctness is very hard to achieve by testing alone; (ii) the high performance requirements needed in many applications are hard to measure before implementation; and (iii) there is an unavoidable tension between the *degree of consistency* needed for the intended applications and the *high performance* required of the transaction protocol for such applications: balancing well these two requirements is essential.

In this work, we present our results on how to use formal modeling and analysis as early as possible in the design process to arrive at a mature design of a *new* distributed transaction protocol, called ROLA, meeting specific correctness and performance requirements *before* such a protocol is implemented. In this way, the above-mentioned design challenges (i)–(iii) can be adequately met. We also show how using this formal design approach it is relatively easy to *compare* ROLA with other existing transaction protocols.

**ROLA in a Nutshell.** Different applications require negotiating the consistency vs. performance trade-offs in different ways. The key issue is the application's required *degree of consistency*, and how to meet such requirements with *high performance*. Cerone *et al.* [4] survey a *hierarchy of consistency models* for distributed transaction protocols including (in increasing order of strength):


A key property of transaction protocols is the *prevention of lost updates* (PLU). The weakest consistency model in [4] satisfying both RA and PLU is PSI. However, PSI, and the well-known protocol Walter [20] implementing PSI, also guarantee CC. Cerone *et al.* conjecture that a system guaranteeing RA and PLU *without* guaranteeing CC should be useful, but up to now we are not aware of any such protocol. The point of ROLA is exactly to fill this gap: guaranteeing RA and PLU, but not CC. Two key questions are then: (a) are there *applications* needing high performance where RA plus PLU provide a sufficient degree of consistency? and (b) can a new design meeting RA plus PLU *outperform* existing designs, like Walter, meeting PSI?

Regarding question (a), an example of a transaction that requires RA and PLU but not CC is the "becoming friends" transaction on social media. Bailis *et al.* [3] point out that RA is crucial for this operation: If Edinson and Neymar become friends, then Unai should not see a *fractured read* where Edinson is a friend of Neymar, but Neymar is not a friend of Edinson. An implementation of "becoming friends" must obviously guarantee PLU: the new friendship between Edinson and Neymar should not be lost. Finally, CC could be sacrificed for the sake of performance: Assume that Dani is a friend of Neymar. When Edinson becomes Neymar's friend, he sees that Dani is Neymar's friend, and therefore also becomes friend with Dani. The second friendship therefore causally depends on the first one. However, it does not seem crucial that others are aware of this causality: If Unai sees that Edinson and Dani are friends, then it is not necessary that he knows that (this happened *because*) Edinson and Neymar are friends.

Regarding question (b), Sect. 6 shows that ROLA clearly outperforms Walter in all performance requirements for all read/write transaction rates.

**Maude-Based Formal Modeling and Analysis.** In rewriting logic [16], distributed systems are specified as *rewrite theories*. Maude [5] is a highperformance language implementing rewriting logic and supporting various model checking analyses. To model time and performance issues, ROLA is specified in Maude as a *probabilistic rewrite theory* [1,5]. ROLA's RA and PLU requirements are then analyzed by standard model checking, where we disregard time issues. To estimate ROLA's performance, and to compare it with that of Walter, we have also specified Walter in Maude, and subject the Maude models of both ROLA and Walter to *statistical model checking* analysis using the PVeStA [2] tool.

**Main Contributions** include: (1) the design, formal modeling, and model checking analysis of ROLA, a new transaction protocol having useful applications and meeting RA and PLU consistency properties with competitive performance; (2) a detailed performance comparison by statistical model checking between ROLA and the Walter protocol showing that ROLA outperforms Walter in all such comparisons; (3) to the best of our knowledge the first demonstration that, by a suitable use of formal methods, a completely new distributed transaction protocol can be designed and thoroughly analyzed, as well as be compared with other designs, very early on, *before* its implementation.

## **2 Preliminaries**

**Read-Atomic Multi-Partition (RAMP) Transactions.** To deal with everincreasing amounts of data, large cloud systems *partition* their data across multiple data centers. However, guaranteeing strong consistency properties for multipartition transactions leads to high latency. Therefore, trade-offs that combine efficiency with weaker transactional guarantees for such transactions are needed.

In [3], Bailis *et al.* propose an isolation model, *read atomic* isolation, and *Read Atomic Multi-Partition* (RAMP) transactions, that together provide efficient multi-partition operations that guarantee read atomicity (RA).

RAMP uses multi-versioning and attaches metadata to each write. Reads use this metadata to get the correct version. There are three versions of RAMP; in this paper we build on RAMP-Fast. To guarantee that all partitions perform a transaction successfully or that none do, RAMP performs two-phase writes using the two-phase commit protocol (2PC). In the *prepare* phase, each timestamped write is sent to its partition, which adds the write to its local database.<sup>1</sup> In the *commit* phase, each such partition updates an index which contains the highest-timestamped committed version of each item stored at the partition.

RAMP assumes that there is no data *replication*: a data item is only stored at one partition. The timestamps generated by a partition P are unique identifiers but are sequentially increasing only with respect to P. A partition has access to methods get all(<sup>I</sup> : set of items) and put all(<sup>W</sup> : set of item, value pairs).

put all uses two-phase commit for each w in W. The first phase initiates a *prepare* operation on the partition storing w.item, and the second phase completes the commit if each write partition agrees to commit. In the first phase, the client (i.e., the partition executing the transaction) passes a *version* <sup>v</sup> : item, value, tsv, *md* to the partition, where ts<sup>v</sup> is a timestamp generated for the transaction and *md* is metadata containing all other items modified in the same transaction. Upon receiving this version v, the partition adds it to a set *versions*.

<sup>1</sup> RAMP does not consider write-write conflicts, so that writes are always prepared successfully (which is why RAMP does not prevent lost updates).

When a client initiates a get all operation, then for each <sup>i</sup> <sup>∈</sup> <sup>I</sup> the client will first request the latest version vector stored on the server for i. It will then look at the metadata in the version vector returned by the server, iterating over each item in the metadata set. If it finds an item in the metadata that has a later timestamp than the *ts*<sup>v</sup> in the returned vector, this means the value for i is out of date. The client can then request the RA-consistent version of i.

**Rewriting Logic and Maude.** In rewriting logic [16] a concurrent system is specified a as *rewrite theory* (Σ,E <sup>∪</sup> A, *<sup>R</sup>*), where (Σ,E <sup>∪</sup> <sup>A</sup>) is a *membership equational logic theory* [5], with Σ an algebraic signature declaring sorts, subsorts, and function symbols, E a set of conditional equations, and A a set of equational axioms. It specifies the system's state space as an algebraic data type. *R* is a set of *labeled conditional rewrite rules*, specifying the system's local transitions, of the form [l] : <sup>t</sup> −→ <sup>t</sup> **if** *cond*, where *cond* is a condition and l is a label. Such a rule specifies a transition from an instance of t to the corresponding instance of t , provided the condition holds.

Maude [5] is a language and tool for specifying, simulating, and model checking rewrite theories. The distributed state of an object-oriented system is formalized as a *multiset* of objects and messages. A class C with attributes *att*<sup>1</sup> to *att*<sup>n</sup> of sorts s<sup>1</sup> to s<sup>n</sup> is declared class C | *att*<sup>1</sup> : s1, ... , *att*<sup>n</sup> : sn. An object of class C is modeled as a term < o : C | *att*<sup>1</sup> : *v*1, ..., *att*<sup>n</sup> : *v*<sup>n</sup> >, with o its object identifier, and where the attributes *att*<sup>1</sup> to *att*<sup>n</sup> have the current values *v*<sup>1</sup> to *v*n, respectively. Upon receiving a message, an object can change its state and/or send messages to other objects. For example, the rewrite rule

rl [l] : m(O,z) < O : C | a1 : x, a2 : O' > => < O : C | a1 : x + z, a2 : O' > m'(O',x + z) .

defines a transition where an incoming message m, with parameters O and z, is consumed by the target object O of class C, the attribute a1 is updated to x + z, and an outgoing message m'(O',x + z) is generated.

**Statistical Model Checking and PVESTA.** Probabilistic distributed systems can be modeled as *probabilistic rewrite theories* [1] with rules of the form

$$\{ [l] : t(\overrightarrow{x}) \longrightarrow t'(\overrightarrow{x}, \overrightarrow{y}) \text{ if } \text{ } cond(\overrightarrow{x}) \text{ } with \text{ } probability \text{ } \overrightarrow{y} := \pi(\overrightarrow{x})$$

where the term t has new variables −→y disjoint from the variables −→x in the term t. The concrete values of the new variables −→y in t ( −→x , −→y ) are chosen probabilistically according to the probability distribution π( −→x ).

Statistical model checking [18,21] is an attractive formal approach to analyzing (purely) probabilistic systems. Instead of offering a yes/no answer, it can verify a property up to a user-specified level of confidence by running Monte-Carlo simulations of the system model. We then use PVeStA [2], a parallelization of the tool VeStA [19], to statistically model check purely probabilistic systems against properties expressed as QuaTEx expressions [1]. The expected value of a QuaTEx expression is iteratively evaluated w.r.t. two parameters α and <sup>δ</sup> by sampling, until we obtain a value <sup>v</sup> so that with (1−α)100% statistical confidence, the expected value is in the interval [<sup>v</sup> <sup>−</sup> <sup>δ</sup> <sup>2</sup> , v <sup>+</sup> <sup>δ</sup> 2 ].

## **3 The ROLA Multi-Partition Transaction Algorithm**

Our new algorithm for distributed multi-partition transactions, ROLA, extends RAMP-Fast. RAMP-Fast guarantees RA, but it does not guarantee PLU since it allows a write to overwrite conflicting writes: When a partition commits a write, it only compares the write's timestamp t<sup>1</sup> with the local latest-committed timestamp t2, and updates the latest-committed timestamp with t<sup>1</sup> or t2. If the two timestamps are from two conflicting writes, then one of the writes is lost.

ROLA's key idea to prevent lost updates is to sequentially order writes on the same key from a partition's perspective by adding to each partition a data structure which maps each incoming version to an incremental sequence number. For write-only transactions the mapping can always be built; for a read-write transaction the mapping can only be built if there has not been a mapping built since the transaction fetched the value. This can be checked by comparing the last prepared version's timestamp's mapping on the partition with the fetched version's timestamp's mapping. In this way, ROLA prevents lost updates by allowing versions to be prepared only if no conflicting prepares occur concurrently.

More specifically, ROLA adds two partition-side data structures: *sqn*, denoting the local sequence counter, and *seq*[*ts*], that maps a timestamp to a local sequence number. ROLA also changes the data structure of *versions* in RAMP from a set to a list. ROLA then adds two methods: the coordinator-side<sup>2</sup> method update(I : set of items, *OP* : set of operations) and the partition-side method prepare update(v : version, tsprev : timestamp) for read-write transactions. Furthermore, ROLA changes two partition-side methods in RAMP: prepare, besides adding the version to the local store, maps its timestamp to the increased local sequence number; and commit marks versions as committed and updates an index containing the highest-sequenced-timestamped committed version of each item. These two partition-side methods apply to both write-only and readwrite transactions. ROLA invokes RAMP-Fast's put all, get all and get methods (see [3,14]) to deal with read-only and write-only transactions.

ROLA starts a read-write transaction with the update procedure. It invokes RAMP-Fast's get all method to retrieve the values of the items the client wants to update, as well as their corresponding timestamps. ROLA writes then proceed in two phases: a first round of communication places each timestamped write on its respective partition. The timestamp of each version obtained previously from the get all call is also packaged in this *prepare* message. A second round of communication marks versions as committed.

At the partition-side, the partition begins the prepare update routine by retrieving the last version in its *versions* list with the same item as the received version. If such a version is not found, or if the version's timestamp *ts*<sup>v</sup> matches

<sup>2</sup> The *coordinator*, or *client*, is the partition executing the transaction.

#### **Algorithm 1.** ROLA

#### *Server-side Data Structures*


#### *Server-side Methods*

get same as in RAMP-Fast

```
5: procedure prepare update(v : version, tsprev : timestamp)
6: latest ← last w ∈ versions : w.item = v.item
7: if latest = null or tsprev = latest.tsv then
8: sqn ← sqn + 1; seq[v.tsv] ← sqn; versions.add(v)
9: return ack
10: else return latest
11: procedure prepare(v : version)
12: sqn ← sqn + 1; seq[v.tsv] ← sqn; versions.add(v)
13: procedure commit(tsc : timestamp)
14: Its ← {w.item | w ∈ versions ∧ w.tsv = tsc}
15: for i ∈ Its do
16: if seq[tsc] > seq[latestCommit[i]] then latestCommit[i] ← tsc
```
#### *Coordinator-side Methods*

put all, get all same as in RAMP-Fast

```
17: procedure update(I : set of items, OP : set of operations)
18: ret ← get all(I); tstx ← generate new timestamp
19: parallel-for i ∈ I do
20: tsprev ← ret[i].tsv; v ← ret[i].value
21: w ← -
               item = i, value = opi(v), tsv = ts tx, md = (I − {i})
22: p ← prepare update(w,tsprev)
23: if p = latest then
24: invoke application logic to, e.g., abort and/or retry the transaction
25: end parallel-for
26: parallel-for server s : s contains an item in I do
27: invoke commit(tstx) on s
28: end parallel-for
```
the passed-in timestamp *tsprev* , then the version is deemed prepared. The partition keeps a record of this locally by incrementing a local sequence counter and mapping the received version's timestamp *ts*<sup>v</sup> to the current value of the sequence counter. Finally the partition returns an ack to the client. If *tsprev*

does not match the timestamp of the last version in *versions* with the same item, then this *latest* timestamp is simply returned to the coordinator.

If the coordinator receives an ack from prepare update, it immediately commits the version with the generated timestamp tstx. If the returned value is instead a timestamp, the transaction is aborted.

### **4 A Probabilistic Model of ROLA**

This section defines a formal executable probabilistic model of ROLA. The whole model is given at https://sites.google.com/site/fase18submission/.

As mentioned in Sect. 2, statistical model checking assumes that the system is *fully probabilistic*; that is, has no unquantified nondeterminism. We follow the techniques in [6] to obtain such a model. The key idea is that message delays are sampled probabilistically from dense/continuous time intervals. The probability that two messages will have the same delay is therefore 0. If events only take place when a message arrives, then two events will not happen at the same time, and therefore unquantified nondeterminism is eliminated.

We are also interested in correctness analysis of a model that captures all possible behaviors from a given initial configuration. We obtain such a nondeterministic untimed model, that can be subjected to standard model checking analysis, by just removing all message delays from our probabilistic timed model.

#### **4.1 Probabilistic Sampling**

Nodes send messages of the form [Δ, *rcvr* <- *msg*], where Δ is the message delay, *rcvr* is the recipient, and *msg* is the message content. When time Δ has elapsed, this message becomes a *ripe* message {*T*,*rcvr* <- *msg*}, where <sup>T</sup> is the "current global time" (used for analysis purposes only).

To sample message delays from different distributions, we use the following functionality provided by Maude: The function random, where random(k) returns the <sup>k</sup>-th pseudo-random number as a number between 0 and 2<sup>32</sup> <sup>−</sup> 1, and the built-in constant counter with an (implicit) rewrite rule counter => N:Nat. The first time counter is rewritten, it rewrites to 0, the next time it rewrites to 1, and so on. Therefore, each time random(counter) rewrites, it rewrites to the next random number. Since Maude does not rewrite counter when it appears in the condition of a rewrite rule, we encode a probabilistic rewrite rule t( −→<sup>x</sup> ) −→ <sup>t</sup> ( −→x , −→y ) **if** cond( −→x ) with probability −→y := π( −→x ) in Maude as the rule t( −→<sup>x</sup> ) −→ <sup>t</sup> ( −→x , *sample*(π( −→x ))) **if** cond( −→x ). The following operator sampleLogNormal is used to sample a value from a lognormal distribution with mean MEAN and standard deviation SD:

```
op sampleLogNormal : Float Float -> [Float] .
eq sampleLogNormal(MEAN,SD) = exp(MEAN + SD * sampleNormal) .
op sampleNormal : -> [Float] . op sampleNormal : Float -> [Float] .
eq sampleNormal = sampleNormal(float(random(counter) / 4294967296)) .
eq sampleNormal(RAND) = sqrt(- 2.0 * log(RAND)) * cos(2.0 * pi * RAND) .
```
random(counter)/4294967296 rewrites to a different "random" number between 0 and 1 each time it is rewritten, and this is used to define the sampling function. For example, the message delay rd to a remote site can be sampled from a lognormal distribution with mean 3 and standard deviation 2 as follows:

```
eq rd = sampleLogNormal(3.0, 2.0) .
```
#### **4.2 Data Types, Classes, and Messages**

We formalize ROLA in an object-oriented style, where the state consists of a number of *partition* objects, each modeling a partition of the database, and a number of messages traveling between the objects. A *transaction* is formalized as an object which resides inside the partition object that executes the transaction.

*Data Types.* A *version* is a timestamped version of a data item (or key) and is modeled as a 4-tuple version(*key*, *value*, *timestamp*, *metadata*). A timestamp is modeled as a pair ts(*addr* , *sqn*) consisting of a partition's identifier *addr* and a local sequence number *sqn*. Metadata are modeled as a set of keys, denoting, for each key, the other keys that are written in the same transaction.

The sort OperationList represents lists of read and write operations as terms such as (x := read k1) (y := read k2) write(k1, x + y), where LocalVar denotes the "local variable" that stores the value of the key read by the operation, and Expression is an expression involving the transaction's local variables:

```
op write : Key Expression -> Operation [ctor] .
op _:=read_ : LocalVar Key -> Operation [ctor] .
pr LIST{Operation} * (sort List{Operation} to OperationList) .
```
*Classes.* A *transaction* is modeled as an object of the following class Txn:

class Txn | operations : OperationList, readSet : Versions, localVars : LocalVars, latest : KeyTimestamps .

The operations attribute denotes the transaction's operations. The readSet attribute denotes the versions read by the read operations. localVars maps the transaction's local variables to their current values. latest stores the local view as a mapping from keys to their respective latest committed timestamps.

A *partition* (or *site*) stores parts of the database, and executes the transactions for which it is the coordinator/server. A partition is formalized as an object instance of the following class Partition:


The datastore attribute represents the partition's local database as a list of versions for each key stored at the partition. The attribute latestCommit maps to each key the timestamp of its last committed version. tsSqn maps each version's timestamp to a local sequence number sqn. The attributes gotTxns, executing, committed and aborted denote the transaction(s) which are, respectively, waiting to be executed, currently executing, committed, and aborted.

The attribute votes stores the votes in the two-phase commit. The remaining attributes denote the partitions from which the executing partition is awaiting votes, committed acks, first-round get replies, and second-round get replies.

The following shows an initial state (with some parts replaced by '...') with two partitions, p1 and p2, that are coordinators for, respectively, transactions t1, and t2 and t3. p1 stores the data items x and z, and p2 stores y. Transaction t1 is the read-only transaction (xl := read x) (yl := read y), transaction t2 is a write-only transaction write(y, 3) write(z, 8), while transaction t3 is a read-write transaction on data item x. The states also include a buffer of messages in transit and the global clock value, and a table which assigns to each data item the site storing the item. Initially, the value of each item is [0]; the version's timestamp is empty (eptTS), and metadata is an empty set.

```
eq init = { 0.0 | nil}
< tb : Table | table : [sites(x, p1) ;; sites(y, p2) ;; sites(z, p1)] >
< p1 : Partition |
        gotTxns: < t1 : Txn | operations: ((xl :=read x) (yl :=read y)),
                               readSet: empty, latest: empty,
                               localVars: (xl |-> [0], yl |-> [0]) >,
        datastore: (version(x, [0], eptTS, empty)
                    version(z, [0], eptTS, empty)),
        sqn: 1, ... >
< p2 : Partition |
        gotTxns: < t2 : Txn | operations: (write(y, 3) write(z, 8)), ... >
                 < t3 : Txn | operations: ((xl := read x)
                                            write(x, xl plus 1)), ... >
        datastore: version(y, [0], eptTS, empty), ... > .
```
*Messages.* The message prepare(*txn*, *version*, *sender*) sends a version from a write-only transaction to its partition, and prepare(*txn*, *version*, *ts*, *sender*) does the same thing for other transactions, with *ts* the timestamp of the version it read. The partition replies with a message prepare-reply(*txn*, *vote*, *sender*), where *vote* tells whether this partition can commit the transaction. A message commit(*txn*, *ts*, *sender*) marks the versions with timestamp *ts* as committed. get(*txn*, *key*, *ts*, *sender*) asks for the highest-timestamped committed version or a missing version for *key* by timestamp *ts*, and response1(*txn*, *version*, *sender*) and response2(*txn*, *version*, *sender*) respond to first/second-round get requests.

### **4.3 Formalizing ROLA's Behaviors**

This section formalizes the dynamic behaviors of ROLA using rewrite rules, referring to the corresponding lines in Algorithm 1. We only show 2 of the 15 rewrite rules in our model, and refer to the report [14] for further details.<sup>3</sup>

*Receiving prepare Messages (lines 5–10).* When a partition receives a prepare message for a read-write transaction, the partition first determines whether the timestamp of the last version (VERSION) in its local version list VS matches the incoming timestamp TS' (which is the timestamp of the version read by the transaction). If so, the incoming version is added to the local store, the map tsSqn is updated, and a positive reply (true) to the prepare message is sent ("**return** *ack*" in our pseudo-code); otherwise, a negative reply (false, or "**return** *latest*" in the pseudo-code) is sent. Depending on whether the sender PID' of the *prepare* message happens to be PID itself, the reply is equipped with a local message delay ld or a remote message delay rd, both of which are sampled probabilistically from distributions with different parameters:<sup>4</sup>

```
crl [receive-prepare-rw] :
    {T, PID <- prepare(TID, version(K, V, TS, MD), TS', PID')}
    < PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS' >
   =>
    if VERSION == eptVersion or tstamp(VERSION) == TS'
    then < PID : Partition | datastore: (VS version(K,V,TS,MD)), sqn: SQN',
                            tsSqn: insert(TS,SQN',TSSQN), AS' >
         [if PID == PID' then ld else rd fi,
             PID' <- prepare-reply(TID, true, PID)]
    else < PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS' >
         [if PID == PID' then ld else rd fi,
             PID' <- prepare-reply(TID, false, PID)] fi
    if SQN' := SQN + 1 /\ VERSION := latestPrepared(K,VS) .
```
*Receiving Negative Replies (lines 23–24).* When a site receives a prepare-reply message with vote false, it aborts the transaction by moving it to the aborted list, and removes PID' from the "vote waiting list" for this transaction:

```
rl [receive-prepare-reply-false-executing] :
   {T, PID <- prepare-reply(TID, false, PID')}
   < PID : Partition | executing: < TID : Txn | AS >, aborted: TXNS,
                        voteSites: VSTS addrs(TID, (PID' , PIDS)), AS' >
 =>
   < PID : Partition | executing: noTxn,
                        aborted: (TXNS ;; < TID : Txn | AS >),
                        voteSites: VSTS addrs(TID, PIDS), AS' > .
```
<sup>3</sup> We do not give variable declarations, but follow the convention that variables are written in (all) capital letters.

<sup>4</sup> The variable AS' denotes the "remaining" attributes in the object.

#### **5 Correctness Analysis of ROLA**

In this section we use reachability analysis to analyze whether ROLA guarantees read atomicity and prevents lost updates.

For both correctness and performance analysis, we add to the state an object

```
< m : Monitor | log: log >
```
which stores crucial information about each transaction. The *log* is a list of records record(*tid*, *issueTime*, *finishTime*, *reads*,*writes*, *committed*), with *tid* the transaction's ID, *issueTime* its issue time, *finishTime* its commit/abort time, *reads* the versions read, *writes* the versions written, and *committed* a flag that is true if the transaction is committed.

We modify our model by updating the Monitor when needed. For example, when the coordinator has received all committed messages, the monitor records the commit time (T) for that transaction, and sets the "committed" flag to true<sup>5</sup>:

```
crl [receive-committed] :
    {T, PID <- committed(TID, PID')}
    < M : Monitor | log: (LOG record(TID, T', T'', RS, WS, false) LOG') >
    < PID : Partition | executing: < TID : Txn | AS >,
                         committed: TXNS, commitSites: CMTS, AS' >
   =>
    if CMTS'[TID] == empty --- all "committed" received
    then < M : Monitor | log: (LOG record(TID, T', T, RS, WS, true) LOG') >
         < PID : Partition | executing: noTxn, commitSites: CMTS',
                            committed: (TXNS ;; < TID : Txn | AS >, AS' >
    else < M : Monitor | log: (LOG record(TID, T', T'', RS, WS, false) LOG') >
         < PID : Partition | executing: < TID : Txn | AS >,
                            committed: TXNS, commitSites: CMTS', AS' > fi
    if CMTS' := remove(TID, PID', CMTS) .
```
Since ROLA is terminating if a finite number of transactions are issued, we analyze the different (correctness and performance) properties by inspecting this monitor object in the final states, when all transactions are finished.

*Read Atomicity.* A system guarantees RA if it prevents fractured reads, and also prevents transactions from reading uncommitted, aborted, or intermediate data [3], where a transaction T<sup>j</sup> exhibits *fractured reads* if transaction T<sup>i</sup> writes version x<sup>m</sup> and yn, T<sup>j</sup> reads version x<sup>m</sup> and version yk, and k<n [3].

We analyze this property by searching for a reachable *final* state (arrow =>!) where the property does *not* hold:

search [1] initConfig =>! C:Config < M:Address : Monitor | log: LOG:Record > such that fracRead(LOG) or abortedRead(LOG) .

<sup>5</sup> The additions to the original rule are written in italics.

The function fracRead checks whether there are fractured reads in the execution log. There is a fractured read if a transaction TID2 reads X and Y, transaction TID1 writes X and Y, TID2 reads the version TSX of X written by TID1, and reads a version TSY' of Y written *before* TSY (TSY' < TSY). Since the transactions in the log are ordered according to start time, TID2 could appear *before* or *after* TID1 in the log. We spell out the case when TID1 comes before TID2:

```
op fracRead : Record -> Bool .
ceq fracRead(LOG ;
     record(TID1,T1,T1',RS1, (version(X,VX,TSX,MDX), version(Y,VY,TSY,MDY)),true) ; LOG' ;
     record(TID2,T2,T2',(version(X,VX,TSX,MDX), version(Y,VY',TSY',MDY')), WS2,true) ; LOG'')
   = true if TSY' < TSY .
ceq fracRead(LOG ; record(TID2, ...) ; LOG' ; record(TID1, ...) ; LOG'') = true if TSY' < TSY .
eq fracRead(LOG) = false [owise] .
```
The function abortedRead checks whether a transaction TID2 reads a version TSX that was written by an aborted (flag false) transaction TID1:

```
op abortedRead : Record -> Bool .
eq abortedRead(LOG ;
      record(TID1, T1, T1', RS1, (version(X,VX,TSX,MDX), VS), false ) ; LOG' ;
      record(TID2, T2, T2', (version(X,VX,TSX,MDX), VS), WS2, true) ; LOG'') = true .
eq abortedRead(LOG ; record(TID2,...) ; LOG' ; record(TID1,...) ; LOG'') = true.
eq abortedRead(LOG) = false [owise] .
```
*No Lost Updates.* We analyze the PLU property by searching for a final state in which the monitor shows that an update was lost:

```
search [1] initConfig =>! C:Config < M:Address : Monitor | log: LOG:Record >
    such that lu(LOG) .
```
The function lu, described in [14], checks whether there are lost updates in LOG.

We have performed our analysis with 4 different initial states, with up to 8 transactions, 2 data items and 4 partitions, without finding a violation of RA or PLU. We have also model checked the causal consistency (CC) property with the same initial states, and found a counterexample showing that ROLA does *not* satisfy CC. (This might imply that our initial states are large enough so that violations of RA or PLU could have been found by model checking.) Each analysis command took about 30 seconds to execute on a 2.9 GHz Intel 4-Core i7-3520M CPU with 3.7 GB memory.

## **6 Statistical Model Checking of ROLA and Walter**

The weakest consistency model in [4] guaranteeing RA and PLU is PSI, and the main system providing PSI is Walter [20]. ROLA must therefore outperform Walter to be an attractive design. To quickly check whether ROLA does so, we have also modeled Walter—without its data replication features—in Maude (see [11] and https://sites.google.com/site/fase18submission/maude-spec), and use statistical model checking with PVeStA to compare the performance of ROLA and Walter in terms of throughput and average transaction latency.

**Extracting Performance Measures from Executions.** PVeStA estimates the expected (average) value of an expression on a run, up to a desired statistical confidence. The key to perform statistical model checking is therefore to define a measure on runs. Using the monitor in Sect. 5 we can define a number of functions on (states with) such a monitor that extract different performance metrics from this "system execution log."

The function throughput computes the number of committed transactions per time unit. committedNumber computes the number of committed transactions in LOG, and totalRunTime returns the time when all transactions are finished (i.e., the largest *finishTime* in LOG):

```
op throughput : Config -> Float [frozen] .
eq throughput(< M : Monitor | log: LOG > REST)
 = committedNumber(LOG) / totalRunTime(LOG) .
```
The function avgLatency computes the average transaction latency by dividing the sum of the latencies of all committed transactions by the number of such transactions:

```
op avgLatency : Config -> Float [frozen] .
eq avgLatency(< M : Monitor | log: LOG > REST)
 = totalLatency(LOG) / committedNumber(LOG) .
```
where totalLatency computes the sum of all transaction latencies (time between the issue time and the finish time of a committed transaction).

**Generating Initial States.** We use an operator init to *probabilistically* generate initial states: init(*rtx* , *wtx* , *rwtx* , *part*, *keys*, *rops*, *wops*, *rwops*, *distr*) generates an initial state with *rtx* read-only transactions, *wtx* write-only transactions, *rwtx* read-write transactions, *part* partitions, *keys* data items, *rops* operations per read-only transaction, *wops* operations per write-only transaction, *rwops* operations per read-write transactions, and *distr* the key access distribution (the probability that an operation accesses a certain data item). To capture the fact that some data items may be accessed more frequently than others, we also use Zipfian distributions in our experiments.

**Statistical Model Checking Results.** We performed our experiments under different configurations, with 200 transactions, 2–4 operations per transaction, up to 200 data items and 50 partitions, with lognormal message delay distributions, and with uniform and Zipfian data item access distributions.

The plots in Fig. 1 show the *throughput* as a function of the percentage of read-only transactions, number of partitions, and number of keys (data items), sometimes with both uniform and Zipfian distributions. The plots show that ROLA outperforms Walter for all parameter combinations. More partitions gives ROLA higher throughput (since concurrency increases), as opposed to Walter (since Walter has to propagate transactions to more partitions to advance the vector timestamp). We only plot the results under uniform key access distribution, which are consistent with the results using Zipfian distributions.

The plots in Fig. 2 show the *average transaction latency* as a function of the same parameters as the plots for throughput. Again, we see that ROLA outperforms Walter in all settings. In particular, this difference is quite large for write-heavy workloads; the reason is that Walter incurs more and more overhead for providing causality, which requires background propagation to advance the vector timestamp. The latency tends to converge under read-heavy workload (because reads in both ROLA and Walter can commit locally without certification), but ROLA still has noticeable lower latency than Walter.

**Fig. 1.** Throughput comparison under different workload conditions.

Computing the probabilities took 6 hours (worst case) on 10 servers, each with a 64-bit Intel Quad Core Xeon E5530 CPU with 12 GB memory. Each point in the plots represents the average of three statistical model checking results.

## **7 Related Work**

Maude and PVeStA have been used to model and analyze the correctness and performance of a number of distributed data stores: the Cassandra key-value store [12,15], different versions of RAMP [10,13], and Google's Megastore [7,8]. In contrast to these papers, our paper uses formal methods to develop and validate an entirely new design, ROLA, for a new consistency model.

Concerning formal methods for distributed data stores, engineers at Amazon have used TLA+ and its model checker TLC to model and analyze the correctness of key parts of Amazon's celebrated cloud computing infrastructure [17]. In contrast to our work, they only use formal methods for correctness analysis; indeed, one of their complaints is that they cannot use their formal method for performance estimation. The designers of the TAPIR transaction protocol for distributed storage systems have also specified and model checked correctness (but not performance) properties of their design using TLA+ [22].

**Fig. 2.** Average latency comparison across varying workload conditions.

#### **8 Conclusions**

We have presented the formal design and analysis of ROLA, a distributed transaction protocol that supports a new consistency model not present in the survey by Cerone *et al.* [4]. Using formal modeling and both standard and statistical model checking analyses we have: (i) validated ROLA's RA and PLU consistency requirements; and (ii) analyzed its performance requirements, showing that ROLA outperforms Walter in all performance measures.

This work has shown, to the best of our knowledge for the first time, that the design and validation of a *new* distributed transaction protocol can be achieved relatively quickly *before* its implementation by the use of formal methods. Our next planned step is to implement ROLA, evaluate it experimentally, and compare the experimental results with the formal analysis ones. In previous work on existing systems such as Cassandra [9] and RAMP [3], the performance estimates obtained by formal analysis and those obtained by experimenting with the real system were basically in agreement with each other [10,12]. This confirmed the useful predictive power of the formal analyses. Our future research will investigate the existence of a similar agreement for ROLA.

**Acknowledgments.** We thank Andrea Cerone, Alexey Gotsman, Jatin Ganhotra, and Rohit Mukerji for helpful early discussions on this work, and the anonymous reviewers for useful comments. This work was supported in part by the following grants: NSF CNS 1409416, NSF CNS 1319527, AFOSR/AFRL FA8750-11-2-0084, and a generous gift from Microsoft.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism**

Fotios Gioulekas<sup>1</sup> , Peter Poplavko<sup>2</sup> , Panagiotis Katsaros1,5(B) , Saddek Bensalem<sup>3</sup> , and Pedro Palomo<sup>4</sup>

<sup>1</sup> Aristotle University of Thessaloniki, Thessaloniki, Greece *{*gioulekas,katsaros*}*@csd.auth.gr <sup>2</sup> Mentor-<sup>R</sup> , A Siemens Business, Montbonnot, France petro.poplavko@siemens.com <sup>3</sup> Universit´e Grenoble Alpes (UGA), VERIMAG, Grenoble, France Saddek.Bensalem@univ-grenoble-alpes.fr <sup>4</sup> Deimos Space-<sup>R</sup> , Madrid, Spain pedro.palomo@deimos-space.com <sup>5</sup> Information Technology Institute, Centre of Research and Technology, Thessaloniki, Greece

**Abstract.** A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with task parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (task executions), hence, the program outputs are independent from the task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model's constraints; it has been implemented in an executable formal specification language. The model's implementation runs on multi-core embedded systems, and supports integration of runtime managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

**Keywords:** Process network · Stream processing · Reactive control Real-time

The research leading to these results has received funding from the European Space Agency project MoSaTT-CMP, Contract No. 4000111814/14/NL/MH.

c The Author(s) 2018

A. Russo and A. Sch¨urr (Eds.): FASE 2018, LNCS 10802, pp. 94–110, 2018. https://doi.org/10.1007/978-3-319-89363-1\_6

#### **1 Introduction**

The proliferation of multi-cores in timing-critical embedded systems requires a programming paradigm that addresses the challenge of ensuring predictable timing. Two prominent paradigms and a variety of associated languages are widely used today. For streaming signal processing, synchronous dataflow languages [18] allow writing programs in the form of directed graphs with nodes for their functions and arcs for the data flows between functions. Such programs can exploit concurrency when they are deployed to multi-cores [15], while their functions can be statically scheduled [17] to ensure a predictable timing behavior.

On the other hand, the reactive-control synchronous languages [12] are used for reactive systems (*e.g.,* flight control systems) expected to react to stimuli from the environment within strict time bounds. The synchronicity abstraction eliminates the non-determinism from the interleaving of concurrent behaviors.

The synchronous languages lack appropriate concepts for task parallelism and timing-predictable scheduling on multiprocessors, whereas the streaming models do not support reactive behavior. The *Fixed Priority Process Network* (FPPN) model of computation has been proposed as a trade-off between streaming and reactive control processing, for task parallel programs. In FPPNs, task invocations depend on a combination of periodic data availability (similar to streaming models) and sporadic control events. Static scheduling methods for FPPNs [20] have demonstrated a predictable timing on multi-cores. A first implementation of the model [22] in an executable formal specification language called BIP (Behavior, Interaction, Priority) exists, more specifically in its real-time dialect [3] extended to tasks [10]. In [21], the FPPN scheduling was studied by taking into account resource interference; an approach for incrementally plugging online schedulers for HW/SW resource sharing (*e.g.,* for QoS management) was proposed.

This article presents the first comprehensive FPPN semantics definition, at two levels: semantics for sequential execution, which ensures functional determinism, and a real-time semantics for concurrent task execution while adhering to the constraints of the former semantics. Our definition is related to a new model transformation framework, which enables programming at a high level by embedding FPPNs into the architecture description, and allows an incremental refinement in terms of task interactions and scheduling<sup>1</sup>. Our approach is demonstrated with a real spacecraft on-board application ported onto the European Space Agency's quad-core Next Generation Microprocessor (NGMP).

#### **2 Related Work**

Design frameworks for embedded applications, like Ptolemy II [6] and PeaCE [11], allow designing systems through refining high-level models. They are based on various models of computation (MoC), but we focus mainly on those that support task scheduling with timing constraints. Dataflow MoCs that

<sup>1</sup> The framework is online at [2].

stem from the Kahn Process Networks [16] have been adapted for the timing constraints of signal processing applications and design frameworks like Comp-SoC [13] have been introduced; these MoCs do not support reactive behavior and sporadic tasks as in the FPPN MoC that can be seen as an extension in that direction. DOL Critical [10] ensures predictable timing, but its functional behavior depends on scheduling. Another timing-aware reactive MoC that does not guarantee functional determinism is the DPML [4]. The Prelude design framework [5] specifies applications in a synchronous reactive MoC, but due to its expressive power it is hard to derive scheduling analyses, unless restricting its semantics. Last but not the least, though the reactive process networks (RPN) [8] do not support scheduling with timing constraints, they lay an important foundation for combining the streaming and reactive control behaviors. In the FPPN semantics we reuse an important principle of RPN semantics, namely, performing the *maximal execution run* of a dataflow network in response to a control event.

## **3 A PN Model for Streaming and Reactive Control**

An FPPN model is composed of *Processes*, *Data Channels* and *Event Generators*.

A *Process* represents a software subroutine that operates with internal variables and input/output channels connected to it through ports. The *functional code* of the application is defined in processes, whereas the necessary *middleware* elements of the FPPN are channels, event generators, and *functional priorities*, which define a relation between the processes to ensure deterministic execution.

An example process is shown in Fig. 1. This process performs a check on the internal variables, if the check succeeds then it reads from the input channel, and, if the value read *is valid* (refer to the channel definition below) its square is computed. The write operation on an output channel is then performed. A call to the process subroutine is referred to as a *job*. Like the real-time jobs, the subroutine should have a bounded execution time subject to WCET (worst-case execution time) analysis.

```
struct SQ_Inititialize
 SQ_index = 0; 
 SQ_length = 200
}
void SQ_PeriodicJob
   float x, y; 
   bool x_valid; 
   if (SQ_index < 
         XIF_Read
         if(x_valid
             y = x * x;
             y_valid
             YIF_Write
          }
    } 
SQ_index++;
}
       SQ_Inititialize(){
               200;
     SQ_PeriodicJob() {
                    SQ_length) {
         XIF_Read(&x, &x_valid);
            x_valid == true) {
             y_valid = true; 
             YIF_Write(&y);
```
**Fig. 1.** Example code for "Square" process

An FPPN is defined by two directed graphs. The first is a (possibly cyclic) graph (P, C), whose nodes P are processes and edges C are channels for pairs of communicating processes with a dataflow direction, *i.e.,* from the writer to the reader (there are also external channels interacting with the environment).

**Fig. 2.** Example Fixed Priority Process Network

A channel is denoted by a c ∈ C or a pair (p1, p2) of writer and reader. For p<sup>1</sup> the channel is said to be an output and for p<sup>2</sup> an input. The second graph (P, FP) is the functional priority directed acyclic graph (DAG) defining a functional priority relation between processes. For any two communicating processes we require,

$$(p\_1, p\_2) \in C \implies (p\_1, p\_2) \in \mathcal{FP} \lor (p\_2, p\_1) \in \mathcal{FP}$$

*i.e.,* a functional priority either follows the direction of dataflow or the opposite. Given a (p1, p2) ∈ FP, p<sup>1</sup> is said to have a *higher priority* than p2.

The FPPN in Fig. 2, represents an imaginary data processing application, where the "X" sporadic process generates values, "Square" calculates the square of the received value and the "Y" periodic process serves as sink for the squared value. A sporadic event (command from the environment) invokes "X", which is annotated by its minimal inter-arrival time. The periodic processes are annotated by their periods. The two types of non-blocking channels are also illustrated. The FIFO (or mailbox) has a semantics of a queue. The blackboard remembers the last written value that can be read multiple times. The arc depicted above the channels indicates the functional priority relation FP. Additionally, the external input/output channels are shown. In this example, the dataflow in the channels go in the opposite direction of the functional priority order. Note that, by analogy to the scheduling priorities, a convenient method to define priority is to assign a unique priority index to every process, the smaller the index the higher the priority. This method is demonstrated in Fig. 2. In this case the minimal required FP relation would be defined by joining each pair of communicating processes by an arc going from the higher-priority process to the lower-priority one.

Let us denote by *Var* the set of all variables. For a variable x or an ordered set (vector) X of variables we denote by **D**(x) (resp. **D**(X)) its domain (or vector of domains), *i.e.,* the set(s) of values that the variable(s) may take. Valuations of variables X are shown as X<sup>0</sup>, X<sup>1</sup> ..., or simply as X, dropping the superscript. Each variable is assumed to have a unique initial valuation. From the software point of view, this means that all variables are initialized by a default value.

*Var* includes all *process state* variables <sup>X</sup>p and the *channel state* variables <sup>γ</sup>c. The current valuation of a state variable is often referred to simply as *state*. For a variable of channel <sup>c</sup>, an alphabet <sup>Σ</sup>c and a type *CT*c are defined; a *channel type* consists of write 'operations' (Wc) and read 'operations' (Rc) defined as functions specifying the variable evolution. Function <sup>W</sup>c : **<sup>D</sup>**(c) <sup>×</sup> <sup>Σ</sup>c <sup>→</sup> **<sup>D</sup>**(c) defines the update after writing a symbol <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>c to the channel, whereas <sup>R</sup>c : **<sup>D</sup>**(c) <sup>→</sup> **<sup>D</sup>**(c)×Σc maps the channel state to a pair (Rc1, Rc2), where Rc<sup>1</sup> is the new channel state and Rc<sup>2</sup> is the symbol that is read from the channel. For a FIFO channel, its state <sup>γ</sup>c is a (initially empty) string and the write operation left-concatenates symbol <sup>s</sup> to the string: <sup>W</sup>c(γc, s) = <sup>s</sup>◦γc. For the same channel, <sup>R</sup>c(γc ◦ <sup>s</sup>)=(γc, s), *i.e.,* we read and remove the last symbol from the string. The write and read functions are defined for each possible channel state, thus rendering the channels non-blocking. This is implemented by including ⊥ in the alphabet, in order to define the read operation when the channel does not contain any 'meaningful' data. Thus, reading from an empty FIFO is defined by: <sup>R</sup>c()=(, <sup>⊥</sup>), where denotes an empty string. For blackboard channel, its state is a (initially empty) string that contains at most one symbol – the last symbol written to the channel: <sup>W</sup>c(γc, s) = <sup>s</sup>, <sup>R</sup>c(γc)=(γc, γc), <sup>R</sup>c()=(, <sup>⊥</sup>).

An *external channel*'s state is an infinite sequence of samples, *i.e.,* variables c[1], c[2], c[3],... with the same domain. For a sample c[k], k is the *sample index*. Though the sequence is infinite, no infinite memory is required, because each sample can be accessed (as will be shown) within a limited time interval. If c is an external output, the channel type defines the sample write operation in the form W c : **<sup>D</sup>** (c) <sup>×</sup> <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>Σ</sup>c <sup>→</sup> **<sup>D</sup>** (c), where **D** (c) is the sample domain, the second argument is the sample index and the result is the new sample value. For an external input, we have the sample read operation <sup>R</sup>c : **<sup>D</sup>** (c)×N<sup>+</sup> <sup>→</sup> **<sup>D</sup>** (c)×Σc. The set of outputs is denoted by O and the set of inputs by I.

The program expressions involve variables. Let us call *Act* the set of all possible *actions* that represent operations on variables. An assignment is an action written as Y := f(X). For the channels, two types of actions are defined, x!c for writing a variable x, and x?c for reading from the channel, where **D**(x) = <sup>Σ</sup>c. For external channels, we have <sup>x</sup>![k]c, c <sup>∈</sup> <sup>O</sup> and <sup>y</sup>?[k]c, c <sup>∈</sup> <sup>I</sup>, where [k] is the sample index. Actions are defined by a function *Effect* : *Act* × **D**(*Var* ) → **D**(*Var* ), which for every action a states how the new values of all variables are calculated from their previous values. The *actions are assumed to have zero delay*. The physical time is modeled by a special action for waiting until time stamp τ , **w**(τ ).

An *execution trace* α ∈ *Act*<sup>∗</sup> is a sequence of actions, *e.g.,*

$$\alpha = \mathbf{w}(0), \ x ?\_{[1]} I\_1, \ x := x^2, \ x !c\_1, \mathbf{w}(100), \ y ?c\_1, \ O\_1!\_{[2]} y$$

The time stamps in the execution are non-decreasing, and denote the time until the next time stamp, at which the following actions occur. In the example, at time 0 we read sample [1] from I<sup>1</sup> and we compute its square. Then we write to channel c1. At time 100, we read from c<sup>1</sup> and write the sample [2] to O1.

A process models a subroutine with a set of locations (code line numbers), variables (data) and operators that define a guard on variables ('if' condition), the action (operator body) and the transfer of control to the next location.

**Definition 1 (Process).** *Each process* p *is associated with a deterministic transition system* (p <sup>0</sup>*,* <sup>L</sup>p*,* <sup>X</sup>p*,* <sup>X</sup>p <sup>0</sup>*,* <sup>I</sup>p*,* <sup>O</sup>p*,* <sup>A</sup>p*,* <sup>T</sup>p)*, with* <sup>L</sup>p *a set of locations,* p <sup>0</sup> <sup>∈</sup> <sup>L</sup>p *an initial location, and* <sup>X</sup>p *the set of state variables with initial values* <sup>X</sup>p <sup>0</sup>*.* <sup>I</sup>p, <sup>O</sup>p *are (internal and external) input/output channels.* <sup>A</sup>p *is a set actions with variable assignments for* <sup>X</sup>p*, reads from* <sup>I</sup>p*, and writes to* <sup>O</sup>p*.* <sup>T</sup>p *is transition relation* <sup>T</sup>p : <sup>L</sup>p <sup>×</sup> <sup>G</sup>p <sup>×</sup> <sup>A</sup>p <sup>×</sup> <sup>L</sup>p*, where* <sup>G</sup>p *is the set of predicates (guarding conditions) defined on the variables from* <sup>X</sup>p*.*

One *execution step* (1, X<sup>1</sup>, γ<sup>1</sup>) <sup>g</sup>:<sup>a</sup> <sup>→</sup> (2, X<sup>2</sup>, γ<sup>2</sup>) for the valuations <sup>X</sup><sup>1</sup>, X<sup>2</sup> of variables in <sup>X</sup>p and the valuations <sup>γ</sup><sup>1</sup>, γ<sup>2</sup> of channels in <sup>I</sup>p ∪ Op, implies that there is transition (1, g, a, 2) ∈ Tp, such that <sup>X</sup><sup>1</sup> satisfies guarding condition <sup>g</sup> (*i.e.,* g(X<sup>1</sup>) = T rue) and (X<sup>2</sup>, γ<sup>2</sup>) = *Effect*(a,(X<sup>1</sup>, γ<sup>1</sup>)).

Definition 1 prescribes a deterministic transition system: for each location <sup>1</sup> the guarding conditions enable for each possible valuation X<sup>i</sup> a single execution step.

**Definition 2 (Process job execution).** *A job execution* (X<sup>1</sup>, γ<sup>1</sup>) <sup>α</sup> −→p (X<sup>2</sup>, γ<sup>2</sup>) *is a non-empty sequence of process* p *execution steps starting and ending in* p*'s initial location* 0*, without intermediate occurrences of* <sup>0</sup>*:*

(<sup>0</sup>, X<sup>1</sup>, γ<sup>1</sup>) <sup>g</sup>1:α<sup>1</sup> <sup>→</sup> (1, X1, γ1)... <sup>g</sup>*n*:α*<sup>n</sup>* <sup>→</sup> (<sup>0</sup>, X<sup>2</sup>, γ<sup>2</sup>), *for* <sup>n</sup> <sup>≥</sup> <sup>1</sup>, i = <sup>0</sup>

From a software point of view, a job execution is seen as a subroutine run from a caller location that returns control back to the caller. We assume that at <sup>k</sup>-th job execution, external channels <sup>I</sup>p, Op are read/written at sample index [k].

In an FPPN, there is a one-to-one mapping between every process p and the respective event generator e that defines the constraints of interaction with the environment. Every <sup>e</sup> is associated with (possibly empty) subsets <sup>I</sup>e, Oe of the external input/output (I/O) channels. Those are the external channels that the process <sup>p</sup> can access: <sup>I</sup>e ⊆ Ip, <sup>O</sup>e ⊆ Op. The I/O sets of different event generators are disjoint, so different processes cannot share external channels.

Every <sup>e</sup> defines the set of possible sequences of time stamps <sup>τ</sup>k for the 'event' of <sup>k</sup>-th invocation of process <sup>p</sup> and a relative deadline <sup>d</sup>e <sup>∈</sup> <sup>Q</sup>+. The intervals [τk, τk <sup>+</sup> <sup>d</sup>e] determine when the <sup>k</sup>-th job execution can occur. This timing constraint has two important reasons. First, if the subsets <sup>I</sup>e or <sup>O</sup>e are not empty then these intervals should indicate the timing windows when the environment opens the k-th sample in the external I/O channels for read or write access at the <sup>k</sup>-th job execution. Secondly, <sup>τ</sup>k defines the order in which the <sup>k</sup>-th job should execute, the earlier it is invoked the earlier it should execute. Concerning the <sup>τ</sup>k sequences, two event generator types are considered, namely *multi-periodic* and *sporadic*. Both are parameterized by a burst size <sup>m</sup>e and a period <sup>T</sup>e. Bursts of <sup>m</sup>e periodic events occur at 0, <sup>T</sup>e, 2Te, etc. For sporadic events, at most <sup>m</sup>e events can occur in any half-closed interval of length <sup>T</sup>e. In the sequel we associate the attributes of an event generator with the corresponding process, *e.g.,* <sup>T</sup>p and <sup>d</sup>p.

**Definition 3 (FPPN).** *An* FPPN *is a tuple* PN = (P, C, FP, ep, Ie, Oe, <sup>d</sup>e, Σc, *CT*c)*, where* <sup>P</sup> *is a set of processes and* <sup>C</sup> <sup>⊆</sup> <sup>P</sup> <sup>×</sup> <sup>P</sup> *is a set of internal channels, with* (P, C) *defining a (possibly cyclic) directed graph. An acyclic directed graph* (P, FP) *is also defined, with* FP ⊂ P × P *a functional priority relation (if* (p1, p2) ∈ FP*, we also write* p<sup>1</sup> → p2*). This relation should be defined at least for processes accessing the same channel,* i.e., (p1, p2) ∈ C⇒p<sup>1</sup> → <sup>p</sup>2∨p<sup>2</sup> <sup>→</sup> <sup>p</sup>1*.* <sup>e</sup>p *maps every process* <sup>p</sup> *to a unique event generator, whereas* <sup>I</sup>e *and* <sup>O</sup>e *map each event generator to (possibly empty) partitions of the global set of external input channels* <sup>I</sup> *and output channels* <sup>O</sup>*, resp.* <sup>d</sup>e *defines the relative deadline for accessing the I/O channels of generator* <sup>e</sup>*,* <sup>Σ</sup>c *defines alphabets for internal and external I/O channels and CT*c *specifies the channel types.*

The priority FP defines the order in which two processes are executed *when invoked at the same time*. It is not necessarily a transitive relation. For example, if (p1, p2) ∈ FP, (p2, p3) ∈ FP, and both p<sup>1</sup> and p<sup>3</sup> get invoked simultaneously then FP does not imply any execution-order constraint between them unless p<sup>2</sup> is also invoked at the same time. The functional priorities differ from the scheduling priorities. The former disambiguate the order of read/write accesses to internal channels, whereas the latter ensure satisfaction of timing constraints.

## **4 Zero-Delay Semantics for the FPPN Model**

The functional determinism requirement prescribes that the data sequences and time stamps at the outputs are a well-defined function of the data sequences and time stamps at the inputs. This is ensured by the so-called functional priorities. In essence, functional priorities control the process job execution order, which is equivalent to the effect of fixed priorities on a set of tasks under uniprocessor fixed-priority scheduling with zero task execution times. A distinct feature of the FPPN model is that priorities are not used directly in scheduling, but rather in the definition of model's semantics. From now on, the term 'task' will refer to an FPPN process. Following the usual real-time systems terminology, invoking a task implies generation of a job which has to be executed before the task's deadline. The so-called *precedence constraints*, *i.e.,* the semantical restrictions of FPPN job execution order are implied firstly from the time stamps when the tasks are invoked and secondly from the functional priorities. In this section, we define these constraints in terms of a sequential order (an execution trace).

The FPPN model requires that *all simultaneous process invocations should be signaled synchronously*. This can be realized by introducing a periodic clock with sufficiently small period (the gcd of all <sup>T</sup>p), such that invocations events can only occur at clock ticks, synchronously. Two variant semantics are then defined, namely the *zero-delay* and the *real-time* semantics.

The *zero-delay semantics* imposes an ordering of the job executions assuming that they have zero delay and that they are never postponed to the future. Since in this case the deadlines are always met even without exploiting parallelism, a sequential execution of processes is considered for simplicity. The semantics is defined in terms of the rules for constructing the execution trace of the FPPN for a given sequence (t1, **P**1), (t2, **P**2) . . . , where t<sup>1</sup> < t<sup>2</sup> <... are time stamps and **<sup>P</sup>**<sup>i</sup> is the multiset of processes invoked at time <sup>t</sup>i. For convenience, we associate each 'invoked process' <sup>p</sup> in **<sup>P</sup>**<sup>i</sup> with respective invocation event, <sup>e</sup>p. The execution trace has the form:

$$Trace(\mathcal{P}\mathcal{N}) = \mathbf{w}(t\_1) \circ \alpha^1 \circ \mathbf{w}(t\_2) \circ \alpha^2 \dots$$

where α<sup>i</sup> is a concatenation of job executions of processes in **P**<sup>i</sup> included in an order, such that if p<sup>1</sup> → p<sup>2</sup> then the job(s) of p<sup>1</sup> execute earlier than those of p2.

#### **Definition 4 (Configuration).** *An FPPN configuration* (π, γ, **P**) *consists of:*


Executing one job in a process network:

$$\begin{array}{c} \left(\pi(p),\gamma\right) \stackrel{\alpha}{\longrightarrow}\_{p} \left(X',\gamma'\right) \wedge e\_{p} \in \mathbf{P} \\ \qquad \wedge \\ \exists p': e\_{p'} \in \mathbf{P} \wedge \left(p',p\right) \in \mathcal{FP} \\ \hline \left(\pi,\gamma,\mathbf{P}\right) \stackrel{\alpha}{\longrightarrow}\_{\mathcal{PN}} \left(\pi\{X'/p\},\gamma',\mathbf{P} \nmid \{e\_{p}\}\right) \end{array}$$

where π{X /p} is obtained from π by replacing the state of p by X .

Given a non-empty set of events **P** invoked at time t, a *maximal execution run* of a process network is defined by a sequence of job executions that continues until the set of pending events is empty.

$$\frac{(\pi^0, \gamma^0, \mathbf{P}) \stackrel{\alpha\_1}{\longrightarrow} \mathcal{P} \mathcal{N} \ (\pi\_1, \gamma\_1, \mathbf{P} \ \{e\_{p\_1}\}) \stackrel{\alpha\_2}{\longrightarrow} \mathcal{P} \mathcal{N} \ \dots \ (\pi^1, \gamma^1, \emptyset)}{(\pi^0, \gamma^0) \stackrel{\mathbf{w}(t) \diamond \alpha\_1 \diamond \alpha\_2 \diamond \dots}{\longmapsto} \mathcal{P} \mathcal{N}(\mathbf{P}) \ (\pi^1, \gamma^1)}$$

Given an initial configuration (π<sup>0</sup>, γ<sup>0</sup>) and a sequence (t1, **P**<sup>1</sup>), (t2, **P**<sup>2</sup>) . . . of events invoked at times t<sup>1</sup> < t<sup>2</sup> <..., the run of process network is defined by a sequence of maximal runs that occur at the specified time stamps.

$$\operatorname{Run}(\mathcal{P}\mathcal{N}) = (\pi^0, \gamma^0) \overset{\alpha^1}{\longmapsto} \mathbb{P}\mathcal{N}(\mathbf{P}^1) \text{ (}\pi^1, \gamma^1\text{)} \overset{\alpha^2}{\longmapsto} \mathbb{P}\mathcal{N}(\mathbf{P}^2) \dashv$$

The execution trace of a process network is a projection of the process network run to actions:

$$Trace(\mathcal{P}\mathcal{N}) = \alpha^1 \circ \alpha^2 \dots$$

This trace represents the time stamps (**w**(t1), **w**(t2)...) and the data processing actions executed at every time stamp. From the effect of these actions it is possible to determine the sequence of values written to the internal and external channels. These values depend on the states of the processes and internal channels. The concurrent activities – the job executions – that modify each process/channel states are deterministic themselves and are ordered relatively to each other in a way which is completely determined by the time stamps and the FP relation. Therefore we can make the following claim.

**Proposition 1 (Functional determinism).** *The sequences of values written at all external and internal channels are functionally dependent on the time stamps of the event generators and on the data samples at the external inputs.*

Basically, this property means that the outputs calculated by FPPN depend only on the event invocation times and the input data sequences, but not on the scheduling. To exploit task parallelism, in the real-time semantics of Sect. 5 the sequential order of execution and the zero-delay assumption are relaxed.

## **5 Real-Time Semantics for the FPPN Model**

In the real-time semantics, job executions last for some physical time and can start concurrently with each other at any time after their invocation. Certain precedence constraints are respected which for certain jobs impose the same relative order of execution as in the zero-delay semantics, so that non-deterministic updates of the states of processes and channels are excluded. To ensure timeliness, the jobs should complete their execution within the deadline after their invocation. The semantics specifies the entities for communication, synchronization, scheduling and is defined by compilation to an *executable* formal specification language.

Our approach is based on (real-time) 'BIP' [3] for modeling networks of connected timed automata components [24]. We adopt the extension in [10], which introduces the concept of *continuous* (asynchronous) automata transitions, which, unlike the default (discrete) transitions take a certain physical time. Next to support of tasks (via continuous transitions), BIP supports the urgency in timing constraints, and those are timed-automata features required for adequate modeling and timing verification of dataflow languages [9]. An important BIP language feature for implementing the functional code of tasks is the possibility to specify data actions in imperative programming language (C/C++).

Figure 3 illustrates how an FPPN process is compiled to a BIP component. The source code is parsed, searching for primitives that are relevant for the interactions of the process with other components. The relevant primitives are the reads and writes from/to the data channels. For those primitives the generated BIP component gets ports, *e.g.,* 'XIF Read(IN x,IN valid)', through which the respective transitions inside the component synchronize and exchange data with other components. In line with Definition 1, every job execution corresponds to a sequence of transitions that starts and ends in an initial location. The first transition in this sequence, 'Start', is synchronized with the event generator component, which enables this transition only after the process has been invoked. The event generator shown in Fig. 3 is a simplified variant for periodic tasks whose deadline is equal to the period. In [22] it is also described how we model internal channels and give more details on event generator modelling.

To ensure a functional behavior equivalent to zero-delay semantics, the job executions have to satisfy precedence constraints between subsequent jobs of the same process, and the jobs of process pairs connected by a channel. In both

**Fig. 3.** Compilation of functional code to BIP

cases, the relative execution order of these subsets of jobs is dictated by zerodelay semantics, whereby the jobs are executed in the invocation order and the simultaneously invoked jobs follow the functional priority order. In this way, we ensure deterministic updates in both cases: (i) for the states of processes by *excluding auto-concurrency*, and (ii) for the data shared between the processes by *excluding data races* on the channels. The precedence constraints for (i) are satisfied by construction, because BIP components for processes never start a new job execution until the previous job of the same process has finished. For the precedence constraints in (ii), an appropriate component is generated for each pair of communicating processes and plugged incrementally into the network of BIP components.

Figure 4 shows such a component generated a given pair of processes "A" and "B", assuming (A, B) ∈ FP. We saw in Fig. 3 that the evolution of a job execution goes through three steps: 'invoke', 'start' and 'finish'. The component handles the three steps of both processes in almost symmetrical way, except in the method that determines whether the job is ready to start: if two jobs are simultaneously invoked, then first the job of process "A" gets ready and then, after it has executed, the job of "B" becomes ready. The "Functional Priority"

**Fig. 4.** Imposing precedence order between "A", "B" ("A" has higher functional priority)

component maintains two job queues<sup>2</sup> denoted <sup>Q</sup>α where <sup>α</sup> ∈ {A, B} indicates a process selection. In our notation, α means 'other than α', *i.e.,* if α = A then α = B and if α = B then α = A.

The component receives from the event generator of process 'α' at regular intervals with period <sup>δ</sup>α either 'Invoke <sup>α</sup>' or 'FalseInvoke <sup>α</sup>'. In the latter case (*i.e.,* no invocation), the job in the tail of the queue is 'pulled' away<sup>3</sup>.


<sup>2</sup> Queues are implemented by a circular buffer with the following operations:

<sup>–</sup> Allocate() picks an available (statically allocated) cell and gives reference to it

<sup>3</sup> Thanks to 'init α' and 'advance α', the queue tail always contains the next anticipated job, which is conservatively marked as non-active until 'Invoke α' transition.

#### **6 Model Transformation Framework**

The model-based design philosophy for embedded systems which we follow [14] is grounded on the evolutionary design using models, which support the gradual refinement (refined models are more accurate than those refined) and the setting of real-time attributes that ensure predictable timing. Such a process allows considering various design scenarios and promotes the late binding to design decisions. Our approach to refinement is based on *incremental component-based* models, where the system is evolved by incrementally plugging new components and transforming existing ones.

**Fig. 5.** Evolutionary design of time-critical systems using FPPNs

We propose such a design approach (Fig. 5), in which we take as a starting point a set of tasks defined by their *functional code* and real-time attributes (*e.g.,* periods, deadlines, WCET, job queue capacity). We assume that these tasks are encapsulated into software-architecture functional blocks, corresponding to FPPN processes. Before being integrated into a single *architectural model* they can be compiled and tested separately by functional simulation or by running on embedded platform.

The high-level architecture description framework of our choice is the TASTE toolset [14,19], whose front-end tools are based on the AADL (Architecture Analysis & Design Language) syntax [7]. An *architecture model* in TASTE consists of functional blocks – so-called 'functions' – which interact with each other via pairs of interfaces (IF) 'required IF'/'provided IF', where the first performs a procedure call in the second one. In TASTE, the provided interfaces can be explicitly used for task invocations, *i.e.,* they may get attributes like 'periodic'/'sporadic', 'deadline' and 'period'. The FPPN processes are represented by TASTE 'functions' that 'provide' such interfaces, implementing job execution of the respective task in C/C++. Our TASTE-to-BIP framework is available for download at [2].

The first refinement step is plugging the data channels for explicit communication between the processes. The data channels are also modeled as TASTE functions, whereas reads and writes are implemented via interfaces. We have

**Fig. 6.** Model and graph transformations for the FPPN semantics

amended the attributes of TASTE functions to reflect the priority index of processes and the parameters of FPPN channels, such as capacity of FIFO channels. The resulting model can be compiled and simulated in TASTE.

The second and final refinement step is scheduling. To schedule on multicores while respecting the real-time semantics of FPPN this step is preceded by transformation from TASTE architectural model into BIP FPPN model. The transformation process implements the FPPN-to-BIP 'compilation' sketched in the previous section, and we believe it could be formalized by a set of *transformation rules*. For example, as illustrated in Fig. 6, one of the rules could say that if there are two tasks τ<sup>1</sup> and τ<sup>2</sup> related by FP relation then their respective BIP components B<sup>1</sup> and B<sup>2</sup> are connected (via 'Start' and 'Finish' ports) to a functional priority component.

The scheduling is done offline, by first deriving a task graph from the architectural model, taking into account the periods, functional priorities and WCET of processes. The task graph represents a maximal set of jobs invoked in a hyperperiod and their precedence constraints; it defines the invocation and the deadline of jobs relatively to the hyperperiod start time. The task graph derivation algorithm is detailed in [20].

**Definition 5 (Task Graph).** *A directed acyclic graph* T G(J , E) *whose nodes* <sup>J</sup> <sup>=</sup> {Ji} *are jobs defined by tuples* <sup>J</sup>i = (pi, ki, Ai, Di, Wi)*, where* <sup>p</sup>i *is the job's process,* <sup>k</sup>i *is the job's invocation count,* <sup>A</sup>i <sup>∈</sup> <sup>Q</sup>≥<sup>0</sup> *is the invocation time,* <sup>D</sup>i <sup>∈</sup> <sup>Q</sup><sup>+</sup> *is the absolute deadline and* <sup>W</sup>i <sup>∈</sup> <sup>Q</sup><sup>+</sup> *is the WCET. The* <sup>k</sup>*-th job of process* p *is denoted by* p[k]*. The edges* E *represent the precedence constraints.*

The task graph is given as input to a static scheduler. The schedule obtained from the static scheduler is translated into parameters for the *online-scheduler* (cf. Fig. 6), which, on top of the functional priority components, further constraints the job execution order and timing, with the purpose of ensuring deadline satisfaction. The joint application/scheduler BIP model is called System Model. This model is eventually compiled and linked with the BIP-RTE, which ensures correct BIP semantics of all components online [23].

## **7 Case Study: Guidance, Navigation and Control Application**

Our design flow was applied to a Guidance Navigation & Control (GNC) onboard spacecraft application that was ported onto ESA's NGMP, more specifically the quad-core LEON4FT processor [1]. In the space industry, multi-cores provide a means for integrating more software functions onto a single platform, which contributes to reducing size, weight, cost, and power consumption. Onboard software has to efficiently utilize the processor resources, while retaining predictability.

A GNC application affects the movement of the vehicle by reading the sensors and controlling the actuators. We estimated the WCETs of all tasks, <sup>W</sup>p, by measurements. There are four tasks: the Guidance Navigation Task (Tp = 500 ms, <sup>d</sup>p = 500 ms, <sup>W</sup>p = 22 ms), the Control Output Task (Tp = 50 ms, <sup>d</sup>p = 50 ms, <sup>W</sup>p = 3ms) that sends the outputs to the appropriate spacecraft unit, the Control FM Task (Tp = 50 ms, <sup>d</sup>p = 50 ms, <sup>W</sup>p = 8 ms) which runs the control and flight management algorithms, and the Data Input Dispatcher Task (Tp = 50 ms, <sup>d</sup>p = 50 ms, <sup>W</sup>p = 6 ms), which reads, decodes and dispatches data to the right destination whenever new data from the spacecraft's sensors are available. The hyperperiod of the system was therefore 500 ms, and it includes one execution of the Guidance Navigation Task and ten executions of each other task, which results in 31 jobs. The Guidance Navigation and Control Output tasks were invoked with relative time offsets 450 ms and 30 ms, respectively. Fig. 7 shows the GNC FPPN, where the functional priorities impose precedence from the numerically smaller FP index (*i.e.,* higher-priority) to the numerically larger ones, we defined them based on analysis of the specification documents and the original implementation of task interactions by inter-thread signalling.

The architectural model in TASTE format was automatically transformed into a BIP model and the task-graph model of the hyperperiod was derived. The task graph was passed to the static scheduler, which calculated the system load to be 112% (*i.e.,* at least two cores required, taking into account precedences [20] and interference [21]) and generated the static schedule.

The BIP model was compiled and linked with the BIP RTE and the executables were loaded and ran on the LEON4FT board. Figure 8 shows the measured Gantt chart of a hyper-period (500 ms) plus 100 ms. We label the process executions as 'P<id>', where '<id>' is a numeric process identifier. Label 'P20' is an exception, it indicates the execution of the BIP RTE engine and all discrete-event controllers – event generators, functional priority controllers, and the online

**Fig. 7.** The GNC FPPN model

**Fig. 8.** Execution of the GNC application on LEON4FT (in microseconds).

scheduler. Since there are four discrete transitions per one job execution and 31 jobs per hyperperiod, 31 × 4 = 124 discrete transitions are executed by BIP RTE per hyperperiod. The P20 activities were mapped to Core 0, whereas the jobs of tasks (P1, P2, P3, P4) were mapped to Core 1 and Core 2. P1 stands for the Data Input Dispatcher, P2 for the Control FM, P3 for the Control Output and P4 for the Guidance Navigation task. Right after 10 consecutive jobs of P1, P2, P3 the job on P4 is executed. The job of P4 is delayed due to the 450 ms invocation offset and the least functional priority. Since P3 and P4 do not communicate via the channels, in our framework (P3, P4) ∈ FP / and they can execute in parallel, which was actually programmed in our static schedule. Due to more than 100% system load this was necessary for deadline satisfaction.

## **8 Conclusion**

We presented the formal semantics of the FPPN model, at two levels: zero-delay semantics with precedence constraints on the job execution order to ensure functional determinism, and real-time semantics for scheduling. The semantics was implemented by a model transformational framework. Our approach was validated through a spacecraft on-board application running on a multi-core. In future work we consider it important to improve the efficiency of code generation, formal proofs of equivalence of the scheduling constraints (like the task graph) and the generated BIP model. The offline and online schedulers need to be enhanced to a wider spectrum of online policies and a better awareness of resource interference.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Distributed Graph Queries for Runtime Monitoring of Cyber-Physical Systems**

M´arton B´ur1,3(B) , G´abor Szil´agyi2, Andr´as V¨or¨os1,2 , and D´aniel Varr´o1,2,3

<sup>1</sup> MTA-BME Lend¨ulet Cyber-Physical Systems Research Group, Budapest, Hungary {bur,vori,varro}@mit.bme.hu <sup>2</sup> Department of Measurement and Information Systems,

Budapest University of Technology and Economics, Budapest, Hungary

<sup>3</sup> Department of Electrical and Computer Engineering, McGill University, Montreal, Canada

**Abstract.** In safety-critical cyber-physical systems (CPS), a service failure may result in severe financial loss or damage in human life. Smart CPSs have complex interaction with their environment which is rarely known in advance, and they heavily depend on intelligent data processing carried out over a heterogeneous computation platform and provide autonomous behavior. This complexity makes design time verification infeasible in practice, and many CPSs need advanced runtime monitoring techniques to ensure safe operation. While graph queries are a powerful technique used in many industrial design tools of CPSs, in this paper, we propose to use them to specify safety properties for runtime monitors on a high-level of abstraction. Distributed runtime monitoring is carried out by evaluating graph queries over a distributed runtime model of the system which incorporates domain concepts and platform information. We provide a semantic treatment of distributed graph queries using 3-valued logic. Our approach is illustrated and an initial evaluation is carried out using the MoDeS3 educational demonstrator of CPSs.

## **1 Introduction**

A smart and safe cyber-physical system (CPS) [23,30,36] heavily depends on intelligent data processing carried out over a heterogeneous computation platform to provide autonomous behavior with complex interactions with an environment which is rarely known in advance. Such a complexity frequently makes design time verification be infeasible in practice, thus CPSs need to rely on run-time verification (RV) techniques to ensure safe operation by monitoring.

Traditionally, RV techniques have evolved from formal methods [24,26], which provide a high level of precision, but offer a low-level specification language (with simple atomic predicates to capture information about the system) which hinders their use in every day engineering practice. Recent RV approaches [17] started to exploit rule-based approaches over a richer information model.

In this paper, we aim to address runtime monitoring of distributed systems from a different perspective by using runtime models (aka models@ runtime [8,38]) which have been promoted for the assurance of self-adaptive systems in [10,44]. The idea is that runtime models serve as a rich knowledge base for the system by capturing the runtime status of the domain, services and platforms as a graph model, which serves as a common basis for executing various analysis algorithms. Offering centralized runtime models accessible via the network, the Kevoree Modeling Framework [28] has been successfully applied in numerous Internet-of-Things applications over the Java platform. However, the use of such run-time models for analysis purposes in *resource-constrained* smart devices or critical CPS components is problematic due to the lack of control over the actual deployment of the model elements to the execution units of the platform.

Graph queries have already been applied in various design and analysis tools for CPSs thanks to their highly expressive declarative language, and their scalability to large industrial models [40]. Distributed graph query evaluation techniques have been proposed in [22,34], but all of these approaches use a cloudbased execution environment, and the techniques are not directly applicable for a heterogeneous execution platform with low-memory computation units.

As a *novelty* in our paper, we specify *safety criteria for runtime monitoring by graph queries* formulated over runtime models (with domain concepts, platform elements, and allocation as runtime information) where graph query results highlight model elements that violate a safety criterion. Graph queries are evaluated over a distributed runtime model where each model element is managed by a dedicated computing unit of the platform while relevant contextual information is communicated to neighboring computing units periodically via asynchronous messages. We provide a *semantic description for the distributed runtime model using 3-valued logic* to uniformly capture contextual uncertainty or message loss. Then we discuss how *graph queries can be deployed as a service to the computing units* (i.e., low-memory embedded devices) of the execution platform of the system in a distributed way, and provide precise *semantics of distributed graph query evaluation over our distributed runtime model*. We provide an *initial performance evaluation* of our distributed query technique over the MoDeS3 CPS demonstrator [45], which is an open source educational platform, and also compare its performance to an open graph query benchmark [35].

## **2 Overview of Distributed Runtime Monitoring**

Figure 1 is an overview of distributed runtime monitoring of CPSs deployed over heterogeneous computing platform using runtime models and graph queries.

Our approach reuses a *high-level graph query language* [41] *for specifying safety properties of runtime monitors*, which language is widely used in various design tools of CPS [37]. Graph queries can capture safety properties with rich structural dependencies between system entities which is unprecedented in most temporal logic formalisms used for runtime monitoring. Similarly, OCL has been used in [20] for related purposes. While graph queries can be extended to express

**Fig. 1.** Distributed runtime monitoring by graph queries

temporal behavior [11], our current work is restricted to (structural) safety properties where the violation of a property is expressible by graph queries.

These queries will be *evaluated over a runtime model which reflects the current state of the monitored system*, e.g. data received from different sensors, the services allocated to computing units, or the health information of computing infrastructure. In accordance with the models@ runtime paradigm [8,38], observable changes of the real system gets updated—either periodically with a certain frequency, or in an event-driven way upon certain triggers.

Runtime monitor programs are *deployed to a distributed heterogeneous computation platform*, which may include various types of computing units ranging from ultra-low-power microcontroller units, through smart devices to high-end cloud-based servers. These computation units primarily process the data provided by sensors and they are able to perform edge- or cloud-based computations based on the acquired information. The monitoring programs are deployed and executed on them exactly as the primary services of the system, thus resource restrictions (CPU, memory) need to be respected during allocation.

Runtime monitors are synthesized by *transforming high-level query specifications into deployable, platform dependent source code* for each computation unit used as part of a monitoring service. The synthesis includes a query optimization step and a code generation step to produce platform-dependent C++ source code ready to be compiled into an executable for the platform. Due to space restrictions, this component of our framework is not detailed in this paper.

Our system-level monitoring framework is hierarchical and distributed. Monitors may observe the local runtime model of the their own computing unit, and they can collect information from runtime models of different devices, hence providing a distributed monitoring architecture. Moreover, one monitor may rely on information computed by other monitors, thus yielding a hierarchical network.

**Running Example.** We illustrate our runtime monitoring technique in the context of a CPS demonstrator [45], which is an educational platform of a model railway system that prevents trains from collision and derailment using safety monitors. The railway track is equipped with several sensors (cameras, shunt detectors) capable of sensing trains on a particular segment of a track connected to some computing units, such as *Arduinos*, *Raspberry Pis*, *BeagleBone Blacks* (BBB), or a *cloud platform*. Computing units also serve as actuators to stop trains on selected segments to guarantee safe operation. For space considerations, we will only present a small self-contained fragment of the demonstrator.

In Fig. 1, the *System Under Monitor* is a snapshot of the system where train tr1 is on segment s4, while tr2 is on s2. The railroad network has a static layout, but turnouts tu1 and tu2 can change between straight and divergent states. Three BBB computing units are responsible for monitoring and controlling disjoint parts of the system. A computing unit may read its local sensors, (e.g. the occupancy of a segment, or the status of a turnout), collect information from other units during monitoring, and it can operate actuators accordingly (e.g. change turnout state) for the designated segment. All this information is reflected in the (distributed) runtime model which is deployed on the three computing units and available for the runtime monitors.

#### **3 Towards Distributed Runtime Models**

#### **3.1 Runtime Models**

Many industrial modeling tools used for engineering CPS [3,31,47] build on the concepts of domain-specific (modeling) languages (DSLs) where a domain is typically defined by a *metamodel* and a set of well-formedness constraints. A metamodel captures the main concepts in a domain as classes with attributes, their relations as references, and specifies the basic structure of graph models.

A metamodel can be formalized as a vocabulary <sup>Σ</sup> <sup>=</sup> {C<sup>1</sup>,..., <sup>C</sup>*<sup>n</sup>*<sup>1</sup> , <sup>A</sup><sup>1</sup>,..., <sup>A</sup>*<sup>n</sup>*<sup>2</sup> , <sup>R</sup><sup>1</sup>,..., <sup>R</sup>*<sup>n</sup>*<sup>3</sup> } with a unary predicate symbol <sup>C</sup>*<sup>i</sup>* for each class, a binary predicate symbol A*<sup>j</sup>* for each attribute, and a binary predicate symbol R*<sup>k</sup>* for each relation.

*Example 1.* Figure 2 shows a metamodel for the CPS demonstrator with Computing Units (identified on the network by hostID attribute) which host Domain Elements and communicate with other Computing Units. A Domain Element is either a Train orRailroad Element where the latter is either aTurnout or a Segment. A Train is situated on a Railroad Element which is connected to at most two other Railroad Elements. Furthermore, a Turnoutrefers to Railroad Elements connecting to itsstraight and divergent exits. A Train also knows its speed.

Objects, their attributes, and links between them constitute a runtime model [8,38] of the underlying system in operation. Changes to the system and its environment are reflected in the runtime model (in an event-driven or timetriggered way) and operations executed on the runtime model (e.g. setting values of controllable attributes or relations between objects) are reflected in the system

**Fig. 2.** Metamodel for CPS demonstrator

itself (e.g. by executing scripts or calling services). We assume that this runtime model is self-descriptive in the sense that it contains information about the computation platform and the allocation of services to platform elements, which is a key enabler for self-adaptive systems [10,44].

<sup>A</sup> *runtime model* M <sup>=</sup> -*Dom<sup>M</sup>*, <sup>I</sup>*M* can be formalized as a 2-valued logic structure over <sup>Σ</sup> where *Dom<sup>M</sup>* <sup>=</sup> *Obj <sup>M</sup> Data<sup>M</sup>* where *Obj <sup>M</sup>* is a finite set of objects, while *Data<sup>M</sup>* is the set of (built-in) data values (integers, strings, etc.). <sup>I</sup>*<sup>M</sup>* is a 2-valued interpretation of predicate symbols in <sup>Σ</sup> defined as follows:


#### **3.2 Distributed Runtime Models**

Our framework addresses decentralized systems where each computing unit periodically communicates a part of its internal state to its neighbors in an *update phase*. We abstract from the technical details of communication, but we assume approximate synchrony [13] between the clocks of computing units, thus all update messages regarded lost that does not arrive within given timeframe Tupdate.

As such, a centralized runtime model is not a realistic assumption for mixed synchronous systems. First, each computing unit has only incomplete knowledge about the system: it fully observes and controls a fragment of the runtime model (to enforce the single source of truth principle), while it is unaware of the internal state of objects hosted by other computing units. Moreover, uncertainty may arise in the runtime model due to sensing or communication issues.

*Semantics of Distributed Runtime Models.* We extend the concept of runtime models to a distributed setting with heterogeneous computing units which periodically communicate certain model elements with each other via messages. We introduce a semantic representation for *distributed runtime models* (DRMs) which can abstract from the actual communication semantics (e.g. asynchronous messages vs. broadcast messages) by (1) evaluating predicates locally at a computing unit with (2) a 3-valued truth evaluation having a third <sup>1</sup>/<sup>2</sup> value in case of uncertainty. Each computing unit maintains a set of facts described by atomic predicates in its local knowledge base wrt. the objects with attributes it hosts, and references between local objects. Additionally, each computing unit incorporates predicates describing outgoing references for each object it hosts.

The 3-valued truth evaluation of a predicate P(v1,...,v*n*) on a computing unit *cu* is denoted by [[P(v<sup>1</sup>,...,v*<sup>n</sup>*)]]@*cu*. The DRM of the system is constituted from the truth evaluation of all predicates on all computing units. For the current paper, we assume the single source of truth principle, i.e. each model element is always faithfully observed and controlled by its host computing unit, thus the local truth evaluation of the corresponding predicate P is always 1 or 0. However, 3-valued evaluation could be extended to handle such local uncertainties.

**Fig. 3.** Distributed runtime model for CPS demonstrator

*Example 2.* Figure 3 shows a DRM snapshot for the CPS demonstrator (bottom part of Fig. 1). Computing units BBB1–BBB3 manage different parts of the system, e.g. BBB1 hosts objects s1, s2, tu1 and tr2 and the links between them. We illustrate the local knowledge bases of computing units.

Since computing unit BBB1 hosts train tr2, thus [[Train(tr2)]]@BBB1 = 1. However, according to computing module BBB2, [[Train(tr2)]]@BBB2 <sup>=</sup> <sup>1</sup>/<sup>2</sup> as there is no train tr2 hosted on BBB2, but it may exist on a different one.

Similarly, [[ConnectedTo(s1,s7)]]@BBB1 = 1, as BBB1 is the host of s1, the source of the reference. This means BBB1 knows that there is a (directed) reference of type connectedTo from s1 to s7. However, the knowledge base on BBB3 may have uncertain information about this link, thus [[ConnectedTo(s1,s7)]]@BBB3 <sup>=</sup> <sup>1</sup>/<sup>2</sup>, i.e. there may be a corresponding link from s1 to s7, but it cannot be deduced using exclusively the predicates evaluated at BBB3.

## **4 Distributed Runtime Monitoring**

#### **4.1 Graph Queries for Specifying Safety Monitors**

To capture the safety properties to be monitored, we rely on the VIATRA Query Language (VQL) [7]. VIATRA has been intensively used in various design tools of CPSs to provide scalable queries over large system models. The current paper aims to reuse this declarative graph query language for runtime verification purposes, which is a novel idea. The main benefit is that safety properties can be captured on a high level of abstraction over the runtime model, which eases the definition and comprehension of safety monitors for engineers. Moreover, this specification is free from any platform-specific or deployment details.

The expressiveness of the VQL language converges to first-order logic with transitive closure, thus it provides a rich language for capturing a variety of complex structural conditions and dependencies. Technically, a graph query captures the erroneous case, when evaluating the query over a runtime model. Thus any match (result) of a query highlights a violation of the safety property at runtime.

*Example 3.* In the railway domain, safety standards prescribe a minimum distance between trains on track [1,14]. Query closeTrains captures a (simplified) description of the minimum headway distance to identify violating situations where trains have only limited space between each other. Technically, one needs to detect if there are two different trains on two different railroad elements, which are connected by a third railroad element. Any match of this pattern highlights track elements where passing trains need to be stopped immediately. Figure 4a shows the graph query closeTrains in a textual syntax, Fig. 4b displays it as a graph formula, and Fig. 4c is a graphical illustration as a graph pattern.

*Syntax.* Formally, a graph pattern (or query) is a first order logic (FOL) formula ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*) over variables [42]. A graph pattern <sup>ϕ</sup> can be inductively constructed (see Table 1) by using atomic predicates of runtime models <sup>C</sup>(v), <sup>A</sup>(v<sup>1</sup>, v<sup>2</sup>), <sup>R</sup>(v<sup>1</sup>, v<sup>2</sup>), <sup>C</sup>, <sup>A</sup>, <sup>R</sup> <sup>∈</sup> <sup>Σ</sup>, equality between variables <sup>v</sup><sup>1</sup> <sup>=</sup> <sup>v</sup><sup>2</sup>, FOL connectives ∨, ∧, quantifiers ∃, ∀, and positive (*call*) or negative (*neg*) pattern calls.



**Table 1.** Semantics of graph patterns (predicates)

This language enables to specify a hierarchy of runtime monitors as a query may explicitly use results of other queries (along pattern calls). Furthermore, distributed evaluation will exploit a spatial hierarchy between computing units.

*Semantics.* A graph pattern ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*) can be evaluated over a (centralized) runtime model M (denoted by [[ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*)]]*<sup>M</sup> <sup>Z</sup>* ) along a variable binding <sup>Z</sup> : {v<sup>1</sup>,...,v*<sup>n</sup>*} → *Dom<sup>M</sup>* from variables to objects and data values in <sup>M</sup> in accordance with the semantic rules defined in Table 1 [42].

A variable binding Z is called a *match* if pattern ϕ is evaluated to 1 over M, i.e. [[ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*)]]*<sup>M</sup> <sup>Z</sup>* = 1. Below, we may use [[ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*)]] as a shorthand for [[ϕ(v<sup>1</sup>,...,v*<sup>n</sup>*)]]*<sup>M</sup> <sup>Z</sup>* when <sup>M</sup> and <sup>Z</sup> are clear from context. Note that min and max take the numeric minimum and maximum values of 0, <sup>1</sup>/<sup>2</sup> and 1 with 0 <sup>≤</sup> <sup>1</sup>/<sup>2</sup> <sup>≤</sup> 1.

#### **4.2 Execution of Distributed Runtime Monitors**

To evaluate graph queries of runtime monitors in a distributed setting, we propose to deploy queries to the same target platform in a way that is compliant with the distributed runtime model and the potential resource restrictions of computation units. If a graph query engine is deployed as a service on a computing unit, it can serve as a *local monitor* over the runtime model. However, such local monitors are usable only when all graph nodes traversed and retrieved during query evaluation are deployed on the same computing unit, which is not the general case. Therefore, a *distributed monitor* needs to gather information from other model fragments and monitors stored at different computing units.

*A Query Cycle.* Monitoring queries are evaluated over a distributed runtime model during the *query cycle*, where individual computing units communicate with each other asynchronously in accordance with the actor model [18].


*Example 4.* Figure 5 shows the beginning of a query evaluation sequence for monitor closeTrains initiated at computing unit BBB3. Calls are asynchronous (cf. actor model), while diagonal lines illustrate the latency of network communication. Message numbers represent the order between timestamps of messages.

When the query is initiated (message 1, shortly, m1), and the first predicate Train of the query is sent to the other two computing unit as requests with a free variable parameter T (m2 and m3). In the reply messages, BBB2 reports tr1 as an object satisfying the predicate (m4), while BBB1 answers that tr2 is a suitable binding to T (m5). Next BBB3 is requesting facts about outgoing

**Fig. 5.** Beginning of distributed query execution for monitor closeTrains

references of type On leading from objects tr2 and tr1 to objects stored in BBB1 and BBB2, respectively (m6 and m7). As the answer, each computing unit sends back facts stating outgoing references from the objects (m8 and m9).

The next message (m10) asks for outgoing references of type ConnectedTo from object s2. To send a reply, first BBB1 asks BBB2 to ensure that a reference from s2 to s3 exists, since s3 is hosted by BBB2 (m11). This check adds tolerance against lost messages during model update. After BBB1 receives the answer from BBB2 (m12), it replies to BBB3 containing all facts maintained on this node.

*Semantics of Distributed Query Evaluation.* Each query is initiated at a designated computing unit which will be responsible for calculating query results by aggregating the partial results retrieved from its neighbors. This aggregation has two different dimensions: (1) adding new matches to the result set calculated by the provider, and (2) making a potential match more precise. While the first case is a consequence of the distributed runtime model and query evaluation, the second case is caused by uncertain information caused by message loss/delay.

Fortunately, the 3-valued semantics of graph queries (see Table 1) already handles the first case: any match reported to the requester by any neighboring provider will be included in the query results if its truth evaluation is 1 or <sup>1</sup>/<sup>2</sup>. As such, any potential violation of a safety property will be detected, which may result in false positive alerts but critical situations would not be missed.

However, the second case necessitates extra care since query matches coming from different sources (e.g. local cache, reply messages from providers) need to be fused in a consistent way. This match fusion is carried out at cu as follows:


Note that in the second case uses max{} to assign a maximum of 3-valued logic values wrt. *information ordering* (which is different from the numerical maximum used in Table 1). Information ordering is a partial order ({<sup>1</sup>/<sup>2</sup>, <sup>0</sup>, <sup>1</sup>}, ) with <sup>1</sup>/<sup>2</sup> 0 and <sup>1</sup>/<sup>2</sup> 1. It is worth pointing out that this distributed truth evaluation is also in line with Soboci´nski 3-valued logic axioms [33].

*Performance Optimizations.* Each match sent as a reply to a computing unit during distributed query evaluation can be cached locally to speed up the reevaluation of the same query within the query cycle. This *caching of query results* is analogous to *memoing* in logic programming [46]. Currently, cache invalidation is triggered at the end of each query cycle by the local physical clock, which we assume to be (quasi-)synchronous with high precision across the platform.

This memoing approach also enables units to selectively store messages in the local cache depending on their specific needs. Furthermore, this can incorporate to deploy query services to computing units with limited amount of memory and prevent memory overflow due to the several messages sent over the network.

A graph query is evaluated according to a *search plan* [43], which is a list of predicates ordered in a way that matches of predicates can be found efficiently. During query evaluation, free variables of the predicates are bound to a value following the search plan. The evaluation terminates when all matches in the model are found. An in-depth discussion of query optimization is out of scope for this paper, but Sect. 5 will provide an initial investigation.

*Semantic Guarantees and Limitations.* Our construction ensures that (1) the execution will surely terminate upon reaching the end of the query time window, potentially yielding uncertain matches, (2) each local model serves as a single source of truth which cannot be overridden by calls to other computing units, and (3) matches obtained from multiple computing units will be fused by preserving information ordering. The over- and under approximation properties of 3-valued logic show that the truth values fused this way will provide a sound result (Theorem 1 in [42]). Despite the lack of total consistency, our approach still has safety guarantees by detecting all *potentially* unsafe situations.

There are also several assumptions and limitations of our approach. We use asynchronous communication without broadcast messages. We only assumed faults of communication links, but not the failures of computing units. We also excluded the case when computing units maliciously send false information. Instead of refreshing local caches in each cycle, the runtime model could incorporate information aging which may enable to handle other sources of uncertainty (which is currently limited to consequences of message loss). Finally, in case of longer cycles, the runtime model may no longer provide up-to-date information at query evaluation time.

*Implementation Details.* The concepts presented in the paper are implemented in a prototype software, which has three main components: (i) an EMF-based tool [39] for data modeling and code generation for the runtime model, (ii) an Eclipse-based tool for defining and compiling monitoring rules built on top of the VIATRA framework [41], and (iii) the runtime environment to evaluate queries.

The design tools are dominantly implemented in Java. We used EMF metamodels for data modeling, but created a code generator to derive lightweight C++ classes as representations of the runtime model. The query definition environment was extended to automatically compile queries into C++ monitors.

The runtime monitoring libraries and the runtime framework is available in C++. Our choice of C++ is motivated by its low runtime and memory overhead on almost any type of platforms, ranging from low-energy embedded microcontrollers to large-scale cloud environments. Technically, a generic *query service* can start *query runners* for each monitoring objective on each node. While query runners execute the query-specific search plan generated compile time, the network communication is handled by a query service if needed. To serialize the data between different nodes, we used the lightweight Protocol Buffers [16].

## **5 Evaluation**

We conducted measurements to evaluate and address two research questions:


## **5.1 Measurement Setup**

*Computation Platform.* We used the real distributed (physical) platform of the CPS demonstrator to answer these research questions (instead of setting up a virtual environment). It consists of 6 interconnected BBB devices (all running embedded Debian Jessie with PREEMPT-RT patch) connected to the railway track itself. This arrangement represents a distributed CPS with several computing units having only limited computation and communication resources. We used these units to maintain the distributed runtime model, and evaluate monitoring queries. This way we are able to provide a realistic evaluation, however, due to the fixed number of embedded devices built into the platform, we cannot evaluate the scalability of the approach wrt. the number of computing units.

*CPS Monitoring Benchmark.* To assess the distributed runtime verification framework, we used the MoDeS3 railway CPS demonstrator where multiple *safety properties* are monitored. They are all based on important aspects of the domain, and they have been integrated into the real monitoring components. Our properties of interest (in increasing complexity of queries) are the following:


Since the original runtime model of the CPS demonstrator has only a total of 49 objects, we scaled up the model by replicating the original elements (except for the computing units). This way we obtained models with 49–43006 objects and 114–109015 links, having similar structural properties as the original one.

*Query Evaluation Benchmark.* In order to provide an independent evaluation for our model query-based monitoring approach, we adapted the open-source Train Benchmark [35] that aims at comparing query evaluation performance of various tools. This benchmark defines several queries describing violations of well-formedness constraints with different complexity over graph models. Moreover, it also provides a model generator to support scalability assessment.

#### **5.2 Measurement Results**

*Execution Times.* The query execution times over models deployed to a single BBB were first measured to obtain a *baseline evaluation time of monitoring* for each rule (referred to as *local* evaluation). Then the execution times of systemlevel distributed queries were measured over the platform with 6 BBBs, evaluating two different allocations of objects (*standard* and *alternative* evaluations).

In Fig. 6 each result captures the times of 29 consecutive evaluations of queries excluding the warm-up effect of an initial run which loads the model and creates necessary auxiliary objects. A query execution starts when a node initiates evaluation, and terminates when all nodes have finished collecting matches and sent back their results to the initiator.

*Overhead of Distributed Evaluation.* On the positive side, the performance of graph query evaluation on a single unit is comparable to other graph query techniques reported in [35] for models with over 100 K objects, which shows a certain level of maturity of our prototype. Furthermore, the CPS demonstrator showed that distributed query evaluation yielded significantly better result than local-only execution for the *Derailment* query on medium size models (with 4K–43K objects reaching 2.23<sup>×</sup> – 2.45<sup>×</sup> average speed-up) and comparable runtime for *Close trains* and *Train locations* queries on these models (with the greatest average difference being 30 ms across all model sizes). However, distributed query evaluation had problems for *End of siding*, which is a complex query with negative application conditions, which provides clear directions for future research. Anyhow, the parallelism

**Fig. 6.** Query evaluations times over different model sizes

of even a small execution platform with only 6 computing units could suppress the communication overhead between units in case of several distributed queries, which is certainly a promising outcome.

*Impact of Allocation on Query Evaluation.* We synthesized different allocations of model elements to computing units to investigate the impact of allocation of model objects on query evaluation. With the CPS demonstrator model in particular, we chose to allocate all Trains to BBB1, and assigned every other node stored previously on BBB1 to the rest of the computing units. Similarly, for the Train Benchmark models, we followed this pattern with selected types, in addition to experimenting with fully random allocation of objects.

The two right-most columns of Fig. 6a and 6b show results of two alternate allocations for the same search plan with a peak difference of 2.06<sup>×</sup> (*Derailment*) and 19.92<sup>×</sup> (*Semaphore neighbor* ) in the two cases. However, both of these allocations were manually optimized to exploit locality of model elements. In case of random allocations, difference in runtime may reach an order of magnitude<sup>1</sup>. Therefore it is worth investigating new allocation strategies and search plans for distributed queries for future work.

*Threats to Validity.* The generalizability of our experimental results is limited by certain factors. First, to measure the performance of our approach, the platform devices (1) executed only query services and (2) connected to an isolated local area network via Ethernet. Performance on a real network with a busy channel would likely have longer delays and message losses thus increasing execution time. Then we assessed performance using a single query plan synthesized automatically by the VIATRA framework but using heuristics to be deployed for a single computation unit. We believe that execution times of distributed queries would likely decrease with a carefully constructed search plan and allocation.

## **6 Related Work**

*Runtime Verification Approaches.* For continuously evolving and dynamic CPSs, an upfront design-time formal analysis needs to incorporate and check the robustness of component behavior in a wide range of contexts and families of configurations, which is a very complex challenge. Thus consistent system behavior is frequently ensured by runtime verification (RV) [24], which checks (potentially incomplete) execution traces against formal specifications by synthesizing verified runtime monitors from provenly correct design models [21,26].

Recent advances in RV (such as MOP [25] or LogFire [17]) promote to capture specifications by rich logic over quantified and parameterized events (e.g. quantified event automata [4] and their extensions [12]). Moreover, Havelund proposed to check such specifications on-the-fly by exploiting rule-based systems based on the RETE algorithm [17]. However, this technique only incorporates low-level events; while changes of an underlying data model are not considered as events.

<sup>1</sup> See Appendix A for details under http://bit.ly/2op3tdy.

Traditional RV approaches use variants of temporal logics to capture the requirements [6]. Recently, novel combinations of temporal logics with contextaware behaviour description [15,19] (developed within the R3-COP and R5-COP FP7 projects) for the runtime verification of autonomous CPS appeared and provide a rich language to define correctness properties of evolving systems.

*Runtime Verification of Distributed Systems.* While there are several existing techniques for runtime verification of sequential programs available, the authors of [29] claim that much less research was done in this area for distributed systems. Furthermore, they provide the first sound and complete algorithm for runtime monitoring of distributed systems based on the 3-valued semantics of LTL.

The recently introduced Brace framework [49] supports RV in distributed resource-constrained environments by incorporating dedicated units in the system to support global evaluation of monitoring goals. There is also focus on evaluating LTL formulae in a fully distributed manner in [5] for components communicating on a synchronous bus in a real-time system. Additionally, machine learning-based solution for scalable fault detection and diagnosis system is presented in [2] that builds on correlation between observable system properties.

*Distributed Graph Queries.* Highly efficient techniques for local-search based [9] and incremental model queries [40] as part of the VIATRA framework were developed, which mainly builds on RETE networks as baseline technology. In [34], a distributed incremental graph query layer deployed over a cloud infrastructure with numerous optimizations was developed. Distributed graph query evaluation techniques were reported in [22,27,32], but none of these techniques considered an execution environment with resource-constrained computation units.

*Runtime Models.* The models@ runtime paradigm [8] serves as the conceptual basis for the Kevoree framework [28] (developed within the HEADS FP7 project). Other recent distributed, data-driven solutions include the Global Data Plane [48] and executable metamodels at runtime [44]. However, these frameworks currently offer very limited support for efficiently evaluating queries over a distributed runtime platform, which is the main focus of our current work.

## **7 Conclusions**

In this paper, we proposed a runtime verification technique for smart and safe CPSs by using a high-level graph query language to capture safety properties for runtime monitoring and runtime models as a rich knowledge representation to capture the current state of the running system. A distributed query evaluation technique was introduced where none of the computing units has a global view of the complete system. The approach was implemented and evaluated on the physical system of MoDeS3 CPS demonstrator. Our first results show that it scales for medium-size runtime models, and the actual deployment of the query components to the underlying platform has significant impact on execution time. In the future, we plan to investigate how to characterize effective search plans and allocations in the context of distributed queries used for runtime monitoring. **Acknowledgements.** This paper is partially supported by MTA-BME Lend¨ulet Cyber-Physical Systems Research Group, the NSERC RGPIN-04573-16 project, the Werner Graupe International Fellowship in Engineering (as part of the MEDA program), and the UNKP-17-2-I New National Excellence Program of the Ministry of ´ Human Capacities. We are grateful for Oszk´ar Semer´ath for helping with the semantics of 3-valued logic, G´abor Sz´arnyas for the help with setting up Train Benchmark, the contributors of MoDeS3 for setting up the evaluation platform, and the feedback from anonymous reviewers and G´abor Bergmann.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# *EventHandler***-Based Analysis Framework for Web Apps Using Dynamically Collected States**

Joonyoung Park1(B) , Kwangwon Sun<sup>2</sup> , and Sukyoung Ryu1(B)

<sup>1</sup> KAIST, Daejeon, Republic of Korea {gmb55,sryu.cs}@kaist.ac.kr <sup>2</sup> Samsung Electronics, Seoul, Republic of Korea kwangwon.sun@samsung.com

**Abstract.** JavaScript web applications (apps) are prevalent these days, and quality assurance of web apps gets even more important. Even though researchers have studied various analysis techniques and software industries have developed code analyzers for their own code repositories, statically analyzing web apps in a sound and scalable manner is challenging. On top of dynamic features of JavaScript, abundant execution flows triggered by user events make a sound static analysis difficult.

In this paper, we propose a novel *EventHandler* (*EH* )-based static analysis for web apps using dynamically collected state information. Unlike traditional whole-program analyses, the *EH* -based analysis intentionally analyzes partial execution flows using concrete user events. Such analyses surely miss execution flows in the entire program, but they analyze less infeasible flows reporting less false positives. Moreover, they can finish analyzing partial flows of web apps that whole-program analyses often fail to finish analyzing, and produce partial bug reports. Our experimental results show that the *EH* -based analysis improves the precision dramatically compared with a state-of-the-art JavaScript whole-program analyzer, and it can finish analysis of partial execution flows in web apps that the whole-program analyzer fails to analyze within a timeout.

**Keywords:** JavaScript · Web applications · Event analysis Static analysis

## **1 Introduction**

Web applications (apps) written in HTML, CSS, and JavaScript have become prevalent, and JavaScript is now the 7th most popular programming language [22]. Because web apps can run on any platforms and devices that provide any browsers, they are being used widely. The overall structure of web apps is specified in HTML, which is represented as a tree structure via Document Object Model (DOM) APIs. CSS describes visual effects like colors, positions, and animation of contents of the web app, and JavaScript handles events triggered by user interaction. JavaScript code can change the status of the web app by interoperation with HTML and CSS, load other JavaScript code dynamically, and access device-specific features via APIs provided by underlying platforms. JavaScript is the *de facto* standard language for web programming these days.

To help developers build high-quality web apps, researchers have studied various analysis techniques and software industries have developed in-house static analyzers. Static analyzers such as SAFE [12,15], TAJS [2,10], and WALA [19] analyze JavaScript web apps without concretely executing them, and dynamic analyzers such as Jalangi [20] utilize concrete values obtained by actually executing the apps. Thus, static analysis results aim to cover all the possible execution flows but they often contain infeasible execution flows, and dynamic analysis results contain only real execution flows but they often struggle to cover abundant execution flows. Such different analysis results are meaningful for different purposes: *sound* static analysis results are critical for verifying absence of bugs and *complete* dynamic analysis results are useful for detecting genuine bugs. In order to enhance the quality of their own software, IT companies develop in-house static analyzers like Infer from Facebook [4] and Tricorder from Google [18].

However, statically analyzing web apps in a sound and scalable manner is extremely challenging. Especially because JavaScript, the language that handles controls of web apps, is totally dynamic, purely static analysis has various limitations. While JavaScript can generate code to execute from string literals during evaluation, such code is not available for static analyzers before run time. In addition, dynamically adding and deleting object properties, and treating property names as values make statically analyzing them difficult [17]. Moreover, since execution flows triggered by user events are abundant, statically analyzing them often incurs analysis performance degradation [16].

Among many challenges in statically analyzing JavaScript web apps, we focus on analysis of event-driven execution flows in this paper. Most existing JavaScript static analyzers are focusing on analysis of web apps at loading time and they over-approximate event-driven execution flows to be sound. In order to consider all possible event sequences soundly, they abstract the event-driven semantics in a way that any events can happen in any order. Such a sound event modeling contains many infeasible event sequences, which lead to unnecessary operations computing imprecise analysis results. Thus, the state-of-the-art JavaScript static analyzers often fail to analyze event flows in web apps.

In this paper, we propose a novel *EventHandler* -*based (*EH*-based) static analysis* for web apps using *dynamically collected state information*. First, we present a new analysis unit, an *EH* . While traditional static analyzers perform wholeprogram analysis covering all possible execution flows, the *EH* -based analysis aims to analyze *partial* execution flows triggered by user events more precisely. In other words, unlike the whole-program analysis that starts analyzing from a single entry point of a given program, the *EH* -based analysis considers each event function call triggered by a user event as an entry point. Because the *EH* -based analysis enables a subset of the entire execution flows to be analyzed at a time, it can analyze less infeasible execution flows than the whole-program analysis, which balances soundness and precision. Moreover, since it considers a smaller set of execution flows, it may finish analysis of web apps that the wholeprogram analysis fails to analyze within a reasonable timeout. Second, in order to analyze each event function call in arbitrary call contexts, we present a hybrid approach to construct an abstract heap for the event function call. More specifically, to analyze each event function body, the analyzer should have information about non-local variables. Thus, for each event function, we construct a conservative abstract initial heap that holds abstract values of non-local variables by abstraction of dynamically collected states.

We formally present the mechanism as a framework, EHA, parameterized by a dynamic event generator and a static whole-program analyzer. After describing the high-level structure of EHA, we present its prototype implementation, EHAman SAFE, instantiated with manual event generation and a state-of-the-art JavaScript static analyzer SAFE. Our experimental results show that EHAman SAFE indeed reports less false positives than SAFE, and it can finish analysis of parts of web apps that SAFE fails to analyze within the timeout of 72 h.

Our paper makes the following contributions:


The remainder of this paper is organized as follows. We first explain the concrete semantics of event handlers in web apps, describe how existing wholeprogram analyzers handle events in a sound but unscalable manner, and present an overview of our approach using concrete code examples in Sect. 2. We describe EHA and its prototype implementation in Sect. 3 and Sect. 4, respectively. We evaluate the EHA instance using real-world web apps in Sect. 5, discuss related work in Sect. 6, and conclude in Sect. 7 with future work.

## **2 Analyses of Event Handlers**

#### **2.1 Event Handlers in Web Apps**

Web apps may receive *events* from their execution environments like browsers or from users<sup>1</sup>. When a web app receives an event, it reacts to the event by executing JavaScript code registered as a handler (or a listener) of the event. An *event handler* consists of three components: an event target, an event type, and a callback function. An event target may be any DOM object like Element, window, and XMLHttpRequest. An event type is a string representation of the event action type such as "load", "click", and "keydown". Finally, a callback function is a JavaScript function to be executed when its corresponding event occurs.

<sup>1</sup> http://www.w3schools.com/js/js events.asp.

**Fig. 1.** (a) A conservative modeling of event control flows (b) Modeling in TAJS [9]

Users execute web apps by triggering various events, thus we consider sequences of events triggered by users as user inputs to web apps. During execution, a set of event handlers that can be executed by a user may vary. First, because event handlers are dynamically registered to and removed from DOM objects, executable event handlers for an event change at run time. For example, when a DOM object has only the following event handler registered:

(A, "click", function f(){ B.addEventListener("click", function g(){}); })

if a user clicks the target A, a new event handler becomes registered, which makes two handlers executable. Second, changes in DOM states of a web app also change a set of executable event handlers for an event. For instance, an event target may be removed from document via DOM API calls, which makes the detached event target inaccessible from users. Also, events may not be captured depending on their capturing/bubbling options and CSS style settings of visibility or display. In addition, it is a common practice to manipulate CSS styles like the following:


to hide an element such as a button under another element, making it inaccessible from users. These various features affect event sequences that users can trigger and event handlers that are executed accordingly.

#### **2.2 Analysis of Event Handlers in Whole-Program Analyzers**

Most existing whole-program JavaScript analyzers handle event handlers in a sound but unscalable manner as illustrated in Fig. 1(a). They first analyze toplevel code that is statically available in a given web app; event handlers may be registered during the analysis of top-level code. Then, after the "exit block of top-level code" node, they analyze code initiated by event handlers in any order as denoted by the "trigger all event handlers" node in any number of times. According to this modeling of event control flows, all possible event sequences that occur after loading the top-level code are soundly analyzed. Note that even though whole-program analyzers use this sound event modeling, the analyzers themselves may not be sound because of other features like dynamic code generation. However, because registered event handlers may be removed during evaluation and they may be even inaccessible due to some CSS styles as discussed in Sect. 2.1, the event modeling in Fig. 1(a) may contain too many infeasible event sequences that are impossible in concrete executions. Analysis with lots of infeasible event sequences involves unnecessary computation that wastes analysis time, and often results in imprecise analysis results. Such a conservative modeling of event control flows indeed reports many false positives [16].

To reduce the amount of infeasible event sequences to analyze, TAJS uses a refined modeling of event control flows as shown in Fig. 1(b). Among various event handlers, this modeling distinguishes "load event handlers" and analyzes them before all the other event handlers. While this modeling is technically unsound because non-load events may precede load events [15], most web apps satisfy this modeling in practice. Moreover, because load event handlers often initialize top-level variables, the event modeling in Fig. 1(a) often produces false positives by analyzing non-load event functions before load event functions initialize top-level variables. On the contrary, the TAJS modeling reduces such false positives by analyzing load event handlers before non-load event handlers. Although the TAJS modeling distinguishes a load event, the over-approximation of the other event handler calls still brings analysis precision and scalability issues.

#### **2.3 Analysis of Event Handlers in** *EH* **-Based Analyzers**

To alleviate the analysis precision and scalability problem due to event modeling, we propose the EHA framework, which aims to analyze a subset of execution flows within a limited time budget to detect bugs in partial execution flows rather than to analyze all execution flows. EHA presents two key points to achieve the goal. First, it slices the entire execution flows by using each event handler as an individual entry point, which amounts to consider a given web app as a collection of smaller web apps. This slicing brings the effect of breaking the loop structures in existing event modelings shown in Fig. 1. Second, in order to analyze sliced event control flows in various contexts, EHA constructs an initial abstract heap of each entry point that contains necessary information to analyze a given event control flow by abstracting dynamically collected states. More specifically, EHA takes two components—a *dynamic event generator* and a *static analyzer*—and collects concrete values of non-local variables of event functions via the dynamic event generator, and abstracts the collected values using the static analyzer.

Let us compare static, dynamic, and *EH* -based analyses with an example. We assume that a top-level code registers three event handlers: l, a, and b where l denotes a load event handler, which precedes the others and runs once. In addition, a and b simulate a pop-up and its close button, respectively. Thus, we can represent possible event sequences as a regular expression: l(ab)∗a?. For a given event sequence lababa, Fig. 2 represents the event flows analyzed by each analysis technique. A conservative static analysis contains infeasible event sequences like the ones starting with a or b, whereas a dynamic analysis covers only short prefixes out of infinitely many flows. The *EH* -based analysis slices the web app into three handler units: l, a, and b. Hence, there is no loop in the event modeling; each handler considers every prefix of the given event sequence that ends with itself. For example, the handler a considers la, laba, and lababa as possible event sequences. Moreover, instead of abstracting the evaluation result of each sequence separately and merging them, it first merges the evaluation result of each sequence just before the handler a—l, lab, and labab—and uses its abstraction as the initial heap of analyzing a, which analyzes more event flows.

**Fig. 2.** Event flows analyzed by (a) static, (b) dynamic, and (c) *EH* -based analyses.

**Fig. 3.** Overall structure of EHA

## **3 Technical Details**

This section discusses the EHA framework, which composes of five phases as shown in Fig. 3. Boxes denote modules and ellipses denote data. EHA takes three inputs: a web app (Web App) to analyze and find bugs in it, and two modules to use as its components—a dynamic event sequence generator (Event Generator) and a static analyzer (Static Analyzer). During the first *instrumentation* phase, Instrumentor inserts code that dynamically collects states into the input web app. Then, during the *execution* phase, the Instrumented Web App runs on a browser producing Collected States. One of the input module Event Generator repeatedly receives states of the running web app and sends user events to it during this phase. In the third *unit building* phase, Unit Web App Builder constructs a small Unit Web App for each event handler from Collected States. After analyzing the set of Unit Web Apps by another input module Static Analyzer in the *static analysis* phase, Alarm Aggregator summarizes the resulting set of Bug Reports and generates a Final Bug Report for the original input Web App in the final *alarm aggregation* phase. We now describe each phase in more detail.

```
Inst (h ≡ <head>) = h.addChildFront(<script src="helper" />)
Inst (function f( ··· ) b) = function f( ··· ){
          var envId = getNewEnvId(); var nonlocals = {x-

                                                    1:x1 ···};
          pushCallStack(); collectState(nonlocals); b; popCallStack(); }
Inst (return x;) = { var retVal = x; popCallStack(); return retVal; }
Inst (catch(e){ b }) = { popCallStack(); b }
Inst (x = e) = x = e; update(x-

                                      ,x)
Inst (x ⊕) = x ⊕ ; update(x-

                                     ,x, ⊕ )
Inst (⊕ x) = ⊕ x; update(x-

                                     ,x)
```
**Fig. 4.** Instrumentation rules (partial)

*Instrumentation Phase.* The first phase instruments a given web app so that the instrumented web app can record dynamically collected states during execution. Figure 4 presents the instrumentation rules for the most important cases where the unary operator ⊕ is either ++ or --. For presentation brevity, we abuse the notation and write x to denote the string representation of a variable name x. The *Inst* function converts necessary JavaScript language constructs to others that perform dynamic logging. For example, for each function declaration of f, *Inst* inserts four statements before the function body and one statement after the function body to keep track of non-local variables of the function f.

*Execution Phase.* The execution phase runs an instrumented web app on a browser using events generated by Event Generator. Because EHA is parameterized by the input Event Generator, it may be an automated testing tool or manual efforts. The following definitions formally specify the concepts being used in the execution phase and the rest of this section:


An execution of a web app σ is a sequence of states that are results of evaluation of the web app code. We omit how states change according to the evaluation of different language constructs, but focus on which states are collected during execution. A state s is a pair of a program point p denoting the source location of the code being evaluated and a heap h denoting a memory status. A heap is a map from addresses to objects. An address is a unique identifier assigned whenever an object is created, and an object is a map from fields to values. A field is an object property name and a value is either a primitive value or an address that denotes an object. For presentation brevity, we abuse *Object* to represent *Environment* as well, which is a map from variables to values. Then, EHA collects states at event callback entries during execution:

$$\text{Collected } States(\sigma) = \{ s \mid s \in \sigma \text{ s.t. } s \text{ is at an event callback entry} \}$$

the program points of which are function entries and the call stack depths are 1.

*Unit Building Phase.* As shown in Fig. 3, this phase constructs a set of sliced unit web apps using dynamically collected states. More specifically, it divides the collected states into *EH* units, and then for each *EH* unit u, it constructs an *initial summary* sˆ*<sup>u</sup> <sup>I</sup>* that contains merged values about non-local variables from the states in u. As discussed in Sect. 2.1, an event handler consists of three components: an event target, an event type, and a callback function. Thus, we design an *EH* unit u with an abstract event target φ, an event type τ , and a program point p:

$$\begin{array}{l} u \in \mathbb{U} \\ \phi \in AbsEventTarget \times EventType \times \mathbb{P} \\ \phi \in AbsEventTarget = DOMTreePosition \uplus \mathbb{A} \\ \tau \in EventType \end{array}$$

While we use the same concrete event types and program points for *EH* s, we abstract concrete event targets to maintain a modest number of event targets. We assume the static analyzer expresses analysis results as summaries. A summary sˆ is a map from a pair of a program point and a context to an abstract heap:

$$
\hat{s} \in \mathring{S} = \mathbb{P} \times \operatorname{Cont} \boldsymbol{x} \mapsto \mathring{\mathbb{H}} \qquad \qquad c \in \operatorname{Cont} \boldsymbol{x} \boldsymbol{x}
$$

where *Context* is parameterized by an input static analyzer of EHA.

For each dynamically collected state s = (p, h) with an event target o and an event type τ both contained in h, Unit Web App Builder calculates an *EH* unit u as follows:

$$\begin{aligned} u &= \alpha\_s(s) = (\alpha\_o(o), \,\, \tau, \, p) \\ \text{where } \alpha\_o(o) &= \begin{cases} \, \, \text{DOMTrecPosition}(o) & \text{if } o \text{ is attached on DOM} \\ o & \text{otherwise} \end{cases} \end{aligned}$$

where *DOMTreePosition*(o) represents the DOM tree position of o in terms of sequences of child indices from the root node of DOM. Then, it constructs an initial summary for each unit u, ˆs*<sup>u</sup> <sup>I</sup>* , as follows:

$$\hat{s}\_I^u(p,c) = \begin{cases} \hat{h}\_u^{\text{init}} & \text{if } p \text{ is the global entry point } \wedge \ c = \epsilon\\ \bot\_{\mathbb{H}} & \text{otherwise} \end{cases}$$

The initial summary maps all pairs of program points and contexts to the heap bottom ⊥<sup>H</sup> denoting no information, but it keeps a single map from a pair of the global entry program point and the empty context to the initial abstract heap hinit *<sup>u</sup>* = *<sup>i</sup>* α*h*(h*i*) where s*<sup>i</sup>* ∈ Collected States ∧ α*s*(s*i*) = u ∧ s*<sup>i</sup>* = (p*i*, h*i*). The initial abstract heap for a unit u is a join of all abstraction results of the heaps in the collected states that are mapped to the same u. The heap abstraction α*<sup>h</sup>* and the abstract heap join are parameterized by the input static analyzer.

*Static Analysis Phase.* Now, the static analysis phase analyzes each sliced unit web app one by one, and detects any bugs in it. Let us call the static analyzer that EHA takes as its input SA. Without loss of generality, let us assume that SA performs a whole-program analysis to compute the analysis result ˆsfinal with the initial summary ˆs*<sup>I</sup>* by computing the least fixpoint of a semantics transfer function <sup>F</sup>ˆ: ˆsfinal <sup>=</sup> leastFix λs. <sup>ˆ</sup> (ˆs*<sup>I</sup> S*- Fˆ(ˆs)) and then reports alarms for possible bugs in it. We call an instance of EHA that takes SA as its input static analyzer EHASA. Then, for each *EH* unit <sup>u</sup>, EHASA performs an *EH* -based analysis to compute its analysis result ˆs*<sup>u</sup>* final with the initial summary ˆs*<sup>u</sup> <sup>I</sup>* constructed during the unit building phase by computing the least fixpoint of the same semantics transfer function Fˆ: ˆs*<sup>u</sup>* final = leastFix λs. ˆ (ˆs*<sup>u</sup> I S*- Fˆ(ˆs)). It also reports alarms for possible bugs in each unit u.

*Alarm Aggregation Phase.* The final phase combines all bug reports from sliced unit web apps and constructs a final bug report. Because source locations of bugs in a bug report from a unit web app are different from those in an original input web app, Alarm Aggregator resolves such differences. Since a single source location in the original web app may appear multiple times in differently sliced unit web apps, Alarm Aggregator also merges bug reports for the same source locations.

## **4 Implementation**

This section describes how we implemented concrete data representation and each module in dark boxes in Fig. 3 in our prototype implementation.

*Instrumentor.* The main idea of instrumentor is similar to that of Jalangi [20], a JavaScript dynamic analysis framework, and we implemented the rules (partially) shown in Fig. 4. An instrumented web app collects states during execution by stringifying them and writing them on files. Dynamically collected information may be ordinary JavaScript values or built-in objects of JavaScript engines or browsers, which are often implemented in non-JavaScript, native languages. Because such built-in values are inaccessible from JavaScript code, we omit their values in the collected states. On the contrary, ordinary JavaScript values are stringified in JSON format. A primitive value is stringified by JSON.stringify and stored in ValueMap. An object value is stored in two places—its pointer in Storage and its pointer identifier in ValueMap—and its property values are also recursively stringified and stored in StorageMap. The stringified document, ValueMap, and StorageMap are written in files at the end of execution, and Unit Web App Builder converts them to states in the unit building phase.

**Fig. 5.** Contents in a JavaScript file of a unit web app

*Unit Web App Builder.* In our prototype implementation, the unit web app builder parses the collected states as in JSON format and constructs a unit web app as multiple HTML files and one JavaScript file. A single JavaScript file contains all the information to build an initial abstract heap as Fig. 5. It contains modeling code for built-in objects on the top, declares objects recorded in StorageMap and initializes their properties, and then declares and initializes nonlocal variables, which are all the information needed to build an initial abstract heap. At the bottom, the handler function is being called.

Starting from the above 3 variables— handler, target, and arguments we can fill in contents of a unit web app using the collected states. For each variable, we get its value from the collected states and construct a corresponding JavaScript code. When the value of a variable is a primitive value, create a corresponding code fragment as a string literal. For an object value, get the value from StorageMap using its pointer id, and repeat the process for its property values. For a function object value, repeat the process for its non-local variables.

*Alarm Aggregator.* The alarm aggregator maintains a mapping between different source locations and eliminates duplicated alarms. It should map between locations in the original web app and in sliced unit web apps. Our implementation keeps track of corresponding AST nodes in different web apps, and utilizes the information for mapping locations. It identifies duplicated alarms by string comparison of their bug messages and locations after mapping the source locations.

## **5 Experimental Evaluation**

In this section, we evaluate EHAman SAFE, an instantiation of EHA with manual event generation and SAFE [12], to answer the following research questions:

In the case of providing dynamic events as many as possible,


#### **5.1 Experimental Setup**

We studied 8 open-source game web apps [8], which were used in the evaluation of SAFE. They have various buttons and show event-dependent behaviors. The first two columns of Table 1 show the names and lines of code of the apps, respectively. The first four apps do not use any JavaScript libraries, and the remaining apps use the jQuery library version 2.0.3. They are all cross-platform apps that can run on Chrome, Chrome-extension, and Tizen environments.

To perform experiments, we instantiated EHA with two inputs. As an Event Generator input, we chose manual event generation by one undergraduate researcher who was ignorant of EHA. He was instructed to explore behaviors of web apps as much as possible, and he could check the number of functions being called during execution as a guidance. In order to make execution environments simple enough to reproduce multiple times, we collected dynamic states from a browser without any cached data. As a Static Analyzer input, we use SAFE


**Table 1.** Analysis coverage of SAFE and EHAman SAFE.

because it can analyze the most JavaScript web apps among existing analyzers via the state-of-the-art DOM tree abstraction [14,15] and it supports a bug detector [16]. We ran the apps with Chrome on a 2.9 GHz quad-core Intel Core i7 with 16 GB memory in the execution phase. The other phases are conducted on Ubuntu 16.04.1 with intel Core i7 and 32 GB memory.

#### **5.2 Answers to RQs**

*Answer to RQ1.* For the analysis coverage, we measured the numbers of analyzed functions and true positives by SAFE and EHAman SAFE. Because SAFE could not analyze 4 apps that use jQuery within the timeout of 72 h, we considered only the other apps for SAFE.

Table 1 summarizes the result of analyzed functions. The 3rd to the 5th columns show the numbers of registered event handler functions analyzed by both, SAFE only, and EHAman SAFE only, respectively. Similarly, the 6th to the 8th columns show the numbers of functions analyzed by both, SAFE only, and EHAman SAFE only, respectively. When we compare only the registered event handler functions among all the analyzed functions, EHAman SAFE outperforms SAFE. Even though SAFE was designed to be sound, it missed some behaviors. Our investigation showed that the causes of the unsoundness were due to incomplete DOM modeling. For the numbers of analyzed functions, the analyses covered more than 75% of the functions in common. EHAman SAFE analyzed more functions for the first 3 subjects than SAFE due to missing event registrations caused by incomplete DOM modeling in SAFE. On the other hand, SAFE analyzed more functions for the 4th subject because EHAman SAFE missed flows during the execution phase. We studied the analysis result of the 4th subject in more detail, and found flows that resume previously suspended execution by using cached data in a localStorage object. EHAman SAFE could not analyze the flows because it does not contain cached data, while SAFE could use a sound modeling of localStorage. Lastly, EHAman SAFE did not miss any true positives that SAFE detected, and EHAman SAFE could detect four more true positives in common functions as shown in Table 2, which implies that EHAman SAFE analyzed execution flows in those functions that SAFE missed. We explain Table 2 in more detail in the next answer.

*Answer to RQ2.* To compare the analysis precision, we measured the numbers of false positives (FPs) in alarm reports by SAFE and EHAman SAFE. Note that true positives (TPs) may not be considered as "bugs" by app developers. For example, while SAFE reports a warning when the undefined value is implicitly converted to a number because it is a well-known error-prone pattern, it may be an intentional behavior of a developer. Thus, TPs denote they are reproducible in concrete executions while FPs denote it is impossible to reproduce them in feasible executions. Similarly for RQ1, we compare the analysis precision for four apps that do not use jQuery.

Tables 2 and 3 categorize alarms in three categories: alarms reported by both SAFE and EHAman SAFE, alarms in functions commonly analyzed by both, and alarms in functions that are analyzed by only one. Table 2 shows numbers of TPs and


**Table 2.** Alarms reported by SAFE and EHAman SAFE.

**Table 3.** False alarms categorized by causes


FPs for each app, and Table 3 further categorizes alarms in terms of their causes. Out of 21 common alarms, 6 are TPs and 15 are FPs. Among 15 common FPs, 14 are due to absence of DOM modeling and 1 is due to the unsupported getter and setter semantics. For the functions commonly analyzed by both, they may report different alarms because they are based on different abstract heaps. We observed that 40 FPs from SAFE are due to the over-approximated event system modeling. Especially, the causes of FPs in the 01 and 03 apps are because top-level variables are initialized when non-load event handler functions are called, which implies that the event modeling of Fig. 1(b) would have a similar imprecision problem. On the contrary, EHAman SAFE reported only 16 FPs mostly (10 FPs) due to absence of DOM modeling. The remaining three FPs from object joins and three FPs by handler unit abstraction are due to inherent problems of static analysis that merges multiple values losing precision. Finally, for the functions analyzed by only one analyzer, all the reported alarms are FPs due to absence of DOM modeling and omitted properties in the EHAman SAFE implementation. In short, EHAman SAFE could partially analyze more subjects than SAFE, and it improved the analysis precision by finding four TPs and less FPs for commonly analyzed functions. Especially, its *handler unit abstraction* produced three FPs which are considerably fewer than 40 FPs from over-approximated event modeling in SAFE without missing any TPs.

*Answer to RQ3.* To compare the analysis scalability, we measured the execution time of each phase for the both analyzers as summarized in Table 4.


**Table 4.** Execution time (seconds) of each phase for SAFE and EHAman

SAFE

For SAFE, we measured the time took for analyses of the entire code, toplevel code, and event loops: Total = Top-Level + Event Loop. For four subjects that do not use any JavaScript libraries, the total analysis took at most 1276.6 s among which 951.3 s took for analyzing event loops. While SAFE finished analyzing the top-level code of the other subjects that use jQuery in 137.3 s at the maximum, it could not finish analyzing their entire code within the time of 72 h (259,200 s).

For EHAman SAFE, because the maximum execution time of the instrumentation phase and the alarm aggregation phase are 10.3 s and 4.9 s, respectively, much smaller than the other phases, the table shows only the other phases. For the execution phase, we present the overhead to collect states:

EHAman SAFE (Execution Phase): Total <sup>=</sup> #Call <sup>×</sup> Ave.

The 6th column presents the numbers of event handler function calls that Event Generator executed; each event handler function pauses for 3.24 s on average. In order to understand the performance overhead due to the instrumentation, we measured its slowdown effect by replacing all the instrumented helper functions with a function with the empty body. With the Sunspider benchmark, Jalangi showed x30 slowdown and EHAman SAFE showed x178 slowdown on average. We observed that collecting non-local variables for each function incurs much performance overhead, and more function calls make more overhead.

The unit building phase takes time to generate unit web app code. Our investigation showed that the time heavily depends on the size of collected data. For the static analysis phase, we measured the analysis time of unit web apps except timeout (TO):

EHAman SAFE (Static Analysis Phase):Total = (#EH <sup>−</sup> #TO) <sup>×</sup> Ave. + 1200 <sup>×</sup> #TO

We analyzed each unit web app with the timeout of 1200 s. While the 02 app has no timeout, the 07 app has 87 timeouts out of 94 unit web apps. On average, analysis of 38% (25/66) of the unit web apps was timeout. Note that even for the first four apps that SAFE finished analysis, EHAman SAFE had some timeouts. We conjecture that SAFE finished analysis quickly since it missed some flows because of unsupported DOM modeling. By contrast, because EHAman SAFE analyzes more flows using dynamically collected data, it had several timeouts.

*Answer to RQ4.* To see how many event flows EHAman SAFE covers with a limited time budget, let us consider four apps that SAFE did not finish in 72 h from Tables 1 and 4. EHAman SAFE finished 19% (42/225) of the units within the timeout of 1200 s as shown in Table 4, and the average analysis time excluding timeouts was 76.0 s. Because it implies that web apps have event flows that can be analyzed in about 76 s, it may be meaningful to analyze such simple event flows quickly first to find bugs in them. Starting with 42 units, EHAman SAFE covered 78 functions as shown in Table 1. While SAFE could not provide any bug reports for four apps using jQuery, EHAman SAFE reported 6 alarms from the analzyed functions.

## **6 Related Work**

Researchers have studied event dependencies to analyze event flows more precisely. Madsen *et al.* [13] proposed event-based call graphs, which extend traditional call graphs with behaviors of event handlers such as registration and trigger of events. While they do not consider analysis of DOM state changes and event capturing/bubbling behaviors, EHA addresses them by utilizing dynamically collected states. Sung *et al.* [21] introduced DOM event dependency and exploited it to test JavaScript web apps. Their tool improved the efficiency of event testing but it has not yet been applied for static analysis of event loops.

Taking advantage of both static analysis and dynamic analysis is not a new idea [5]. For JavaScript analysis, researches tried to analyze dynamic features of JavaScript [7] and DOM values of web apps [23,24] precisely. Alimadadi *et al.*[1] proposed a DOM-sensitive change impact analysis for JavaScript web apps. JavaScript Blended Analysis Framework (JSBAF) [26] collects dynamic traces of a given app, specializes dynamic features of JavaScript like eval calls and reflective property accesses utilizing the collected traces. JSBAF analyzes each trace separately and combines the results, but EHA abstracts the collected states on each *EH* first and then analyzes the units to get generalized contexts. Finally, Ko *et al.* [11] proposed a tunable static analysis framework that utilizes a light-weight pre-analysis. Similarly, our work builds an approximation of selected executions by constructing an initial abstract heap utilizing dynamic information, which enables to analyze complex event flows although partially.

## **7 Conclusion and Future Work**

Because existing JavaScript static analyzers conservatively approximate eventdriven flows, even state-of-the-art analyzers often fail to analyze event flows in web apps within a timeout of several hours. We present EHA, a bug detection framework that performs a novel *EH* -based static analysis using dynamically collected state information. As a general framework, EHA is parameterized by a way to generate event sequences and a JavaScript static analyzer. We present EHAman SAFE, an instantiation of EHA with manual event generation and the SAFE JavaScript static analyzer. Our experimental evaluation shows that the *EH* based analysis (EHAman SAFE) reduced false positives reported by the whole-program analysis (SAFE) due to its over-approximation of the event system modeling. Moreover, EHAman SAFE finished analyzing partial execution flows of the web apps that SAFE failed to analyze within the timeout of 72 h. We plan to inspect the soundness issues due to the lack of DOM modeling in whole-program analyzers with systematic ways via dynamic analyses [3,6,25], and to use an automated testing tool as a dynamic event generator instead of the manual generation.

**Acknowledgment.** The research leading to these results has received funding from National Research Foundation of Korea (NRF) (Grants NRF-2017R1A2B3012020 and 2017M3C4A7068177).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Software Design and Verification

# **Hierarchical Specification and Verification of Architectural Design Patterns**

Diego Marmsoler(B)

Technische Universit¨at M¨unchen, Munich, Germany diego.marmsoler@tum.de

**Abstract.** Architectural design patterns capture architectural design experience and provide abstract solutions to recurring architectural design problems. Their description is usually expressed informally and it is not verified whether the proposed specification indeed solves the original design problem. As a consequence, an architect cannot fully rely on the specification when implementing a pattern to solve a certain problem. To address this issue, we propose an approach for the specification and verification of architectural design patterns. Our approach is based on interactive theorem proving and leverages the hierarchical nature of patterns to foster reuse of verification results. The following paper presents FACTum, a methodology and corresponding specification techniques to support the formal specification of patterns. Moreover, it describes an algorithm to map a given FACTum specification to a corresponding Isabelle/HOL theory and shows its soundness. Finally, the paper demonstrates the approach by verifying versions of three widely used patterns: the singleton, the publisher-subscriber, and the blackboard pattern.

**Keywords:** Architectural design patterns Interactive theorem proving · Dynamic architectures Algebraic specification · Configuration traces

## **1 Introduction**

Architectural design patterns capture architectural design experience and provide abstract solutions to recurring architectural design problems. They are an important concept in software engineering and regarded as one of the major tools to support an architect in the conceptualization and analysis of software systems [1]. The importance of patterns resulted in a panoply of pattern descriptions in literature [1–3]. They usually consist of a description of some key architectural constraints imposed by the pattern, such as involved data types, types of components, and assertions about the activation/deactivation of components as well as connections between component ports. These descriptions are usually highly informal and the claim that they indeed solve a certain design problem remains unverified. As a consequence, an architect cannot fully rely on a pattern's specification to solve a design problem faced during the development of a new architecture. Moreover, verified pattern descriptions are a necessary precondition for automatic pattern conformance analyses, since missing assertions in a pattern's specification renders their detection impossible. Compared to concrete architectures, architectural design patterns pose several new challenges to the specification as well as the verification:


This is why traditional techniques for the specification and verification of concrete architectures are not well-suited to be applied for the specification and verification of patterns.

Therefore, we propose an approach for the formal specification and verification of architectural design patterns which is based on interactive theorem proving [4]. Our approach is built on top of a pre-existing model of dynamic architectures [5,6] and its formalization in Isabelle/HOL [7] which comes with a calculus to support reasoning about such architectures [8]. Our approach provides techniques to specify patterns and corresponding design problems and allows to map a specification to a corresponding Isabelle/HOL theory [9]. The theory and the corresponding calculus can then be used to verify that a specification indeed solves the design problem the pattern claims to solve.

With this paper, we elaborate on our previous work by providing the following contributions: First, we present FACTum, a novel approach for the formal specification of architecture design patterns. Second, we provide an improved version of the algorithm to map a given FACTum specification to a corresponding Isabelle/HOL theory and show soundness of the mapping. Third, we demonstrate the approach by specifying and verifying versions of three architectural design patterns: the singleton pattern, the publisher subscriber pattern, and the blackboard pattern.

The remainder of the paper is structured as follows: In Sect. 2, we provide necessary background on interactive theorem proving and configuration traces (our model of dynamic architectures). We then describe our approach to specify patterns in Sect. 3. To this end, we define the notion of (hierarchical) pattern specification and demonstrate it by specifying three architectural design patterns. In Sect. 4, we first define the semantics of a pattern specification in terms of configuration traces. Then, we provide an algorithm to map a given specification to a corresponding Isabelle/HOL theory and show its soundness, i.e., that the semantics of a specification is indeed preserved by the algorithm. We proceed with an overview of related work in Sect. 5 and conclude the paper in Sect. 6 with a brief discussion about how the approach addresses the challenges C1–C3 identified above.

#### **2 Background**

In the following, we provide some background on which our work is build.

#### **2.1 Interactive Theorem Proving**

Interactive theorem proving (ITP) is a semi-automatic approach for the development of formal theories. Therefore, a set of proof assistants [4] have been developed to support a human in the development of formal proofs. Since our approach is based on Isabelle/HOL [9], in the following we describe some relevant features about this specific prover.

In general, Isabelle is an LCF-style [10] theorem prover based on Standard ML. It provides a so-called meta-logic on which different object logics are based. Isabelle/HOL is one of them, implementing higher-order logic for Isabelle. It integrates a prover IDE and comes with an extensive library of theories from various domains. New theories are then developed by defining terms of a certain type and deriving theorems from these definitions. Data types can be specified in Isabelle/HOL in terms of freely generated, inductive data type definitions [11]. Axiomatic specification of data types is also supported in terms of type classes [12]. To support the specification of theories over the data types, Isabelle/HOL provides tools for inductive definitions and recursive function definitions. Moreover, Isabelle/HOL provides a structured proof language called Isabelle/Isar [13] and a set of logical reasoners to support the verification of theorems. Modularization of theories is achieved through the notion of locales [14] in which an interface is specified in terms of sets of functions (called parameters) with corresponding assumptions about their behavior. Locales can extend other locales and may be instantiated by concrete definitions of the corresponding parameters.

#### **2.2 A Model of Dynamic Architectures**

Since architectures implementing an ADP may be dynamic as well (in the sense that components of a certain type can be instantiated over time), our approach is based on a model of dynamic architectures. One way to model such architectures is in terms of *sets* of configuration traces [5,6], i.e., streams [15,16] over architecture configurations. Thereby, architecture configurations can be thought of as snapshots of the architecture during execution. Thus, they consist of a set of (active) components with their ports valuated by messages and connections between the ports of the components. Moreover, components of a certain type may be parametrized by a set of messages.

*Example 1 (Configuration trace).* Assuming that A,..., Z and 1,..., 9 are messages. Figure 1 depicts a configuration trace t with corresponding architecture configurations t(0) = k0, t(1) = k1, and t(2) = k2. Architecture configuration k1, for example, consists of two active components named c<sup>1</sup> and c2. Thereby, component c<sup>1</sup> is parametrized by {A}, has one input port i<sup>0</sup> valuated with {8}, and three output ports o0, o1, o2, valuated with {1}, {G}, and {7}. -

**Fig. 1.** Configuration trace with its first three architecture configurations.

Note that the model allows components to be valuated by a set of messages, rather than just a single message, at each point in time. To evaluate the behavior of a single component, the model comes with an operator Π*c*(t) to extract the behavior of a single component c out of a given configuration trace t.

The model of configuration traces is also implemented by a corresponding Isabelle/HOL theory which is available through the archive of formal proofs [7]. The implementation formalizes a configuration trace as a function *trace* = *nat* → *cnf* and provides an interface to the model in terms of a locale "dynamic component". The locale can be instantiated with components of a dynamic architecture by providing definitions for two parameters:


For each dynamic component instantiating the locale, a set of definitions is provided to support the specification of its behavior [17]. Moreover, a calculus to reason about the behavior of the component in a dynamic context is provided [8].

## **3 Specifying Architectural Design Patterns**

In the following, we describe FACTum, an approach to specify architectural design patterns. Therefore, we first provide a definition of the different parts of a pattern specification and then we explain each part in more detail. We conclude the section with an exemplary specification of three patterns: the singleton, the publisher subscriber, and the blackboard pattern. Thereby, the publisher component is modeled as an instance of the singleton and the blackboard pattern is specified as an instance of the publisher subscriber pattern.

**Definition 1 (Pattern specification).** *A* pattern specification *is a 5-tuple* (*VAR*, *DS*,*IS*, *CT*, *AS*)*, consisting of:*

*– Variables VAR* = (V,V , C, C ) *with*

	- *a* signature Σ = (S, F, B)*, containing sorts* S *and function/predicate symbols* F*/*B *for a pattern's data types,*
	- *a set of* data type assertions *DA specifying the meaning of the signature symbols in terms of a set of axioms, and*
	- *a set of generator clauses Gen to construct data types.*
	- *a set of ports P and corresponding type function tp* : *P* → S *which assigns a sort to each port,*
	- *a set of interfaces* (*CP*,*IP*, *OP*) ∈ *IF with input ports IP* ⊆ *P and output ports OP* ⊆ *P , as well as a set of configuration parameters CP* ⊆ *P .*

Since a pattern specification may also instantiate other pattern specifications, we require that for each instantiated pattern (*VAR* , *DS* ,*IS* , *CT* , *AS* ), the specification contains an additional *port instantiation* (η*<sup>i</sup>*- )*<sup>i</sup>*-∈*IF*- , with *injective* functions η*<sup>i</sup>*- : *CP* ∪ *IP* ∪ *OP* → *CP* ∪ *IP* ∪ *OP*, such that η*<sup>i</sup>*- (*CP* ) ⊆ *CP*, η*i*- (*IP* ) ⊆ *IP*, and η*<sup>i</sup>*- (*OP* ) ⊆ *OP*, for some (*CP*,*IP*, *OP*) ∈ *IF*. Thereby, we require that for each (CP ,IP , OP ) ∈ *IF* and p ∈ *CP* ∪ *IP* ∪ *OP* the corresponding data type refines the type of p , i.e., that *tp*(η*<sup>i</sup>*- (p )) refines (*tp* (p )).

In the following, we explain the different parts of a FACTum specification in more detail.

#### **3.1 Specifying Data Types**

The data types involved in a pattern specification can be specified using *algebraic specification techniques* [18,19]. Algebraic specifications usually consist of two parts: First, a signature Σ = (S, F, B), specifying a set of sorts S and function/predicate symbols F/B, typed by a list of sorts. In addition, an algebraic specification provides a set of axioms *DA* to assign meaning to the symbols of Σ. These axioms specify the characteristic properties of the data types used by a pattern specification and are formulated over the symbols of F and B, respectively. Finally, a data type specification may require that all elements of the corresponding type are constructed by corresponding constructor terms *Gen*, i.e., that each element of the corresponding type is build up from symbols of *Gen*.

## **3.2 Specifying Interfaces**

The specification of interfaces proceeds then in two steps: First, ports are specified by providing a set of ports P and a corresponding mapping *tp* : P → S to specify which types of data may be exchanged through each port. Then, a set of interfaces (*CP*,*IP*, *OP*) is specified by declaring input ports *IP* ⊆ P, output ports *OP* ⊆ P, and a set of configuration parameters *CP* ⊆ P. Thereby, configuration parameters are a way to parametrize components of a certain type and they can be thought of as ports with a predefined value which is fixed for each component.

Interfaces can then be specified using so-called *configuration diagrams* consisting of a graphical depiction of the involved interfaces (see Sect. 3.6 for examples). Thereby, each interface consists of two parts: A name followed by a list of configuration parameters (enclosed between '' and ' '). Input and output ports are represented by empty and filled circles, respectively.

## **3.3 Specifying Component Types**

Component types are specified by assigning assertions about the input/output behavior to the interfaces. Thereby, configuration parameters can be used to distinguish between different components of a certain type.

The assertions are expressed in terms of linear temporal logic equations [20] formulated over the signature Σ by using port names as free variables. For example, the term "-(c.p = POS −→ c.o ≥ 1)" denotes an assertion that port o of component c, for which configuration parameter p has the value POS (for positive), is guaranteed to be greater or equal to 1 for the whole execution of the system.

#### **3.4 Specifying Activation and Connection Assertions**

Finally, a set of assertions about the activation and deactivation of components as well as assertions about connections between component ports are specified. Both types of assertions may be expressed in terms of so called *configuration trace assertions*, i.e, linear temporal logic formulæ with special predicates to denote activation of components and port connections. Thereby, c.p denotes the valuation of port <sup>p</sup> of a component <sup>c</sup> (where *<sup>c</sup>*-.p denotes that port p of component c is valuated, at all), c denotes that a component c is currently active, and c.p c .p denotes that output port p of component c is connected to input port p of component c .

## **3.5 Specifying Pattern Instantiations**

As described above, pattern specifications may be built on top of other pattern specifications by instantiating their component types. Such instantiations can be directly specified in a pattern's configuration diagram by annotating the


**Fig. 3.** Specification of the publisher subscriber pattern.

corresponding interfaces. To denote that a certain component type t of the specification is an instance of component type t (from the instantiated pattern), we simply write t : t followed by a corresponding *port mapping* [p *i*, p *<sup>o</sup>* → p*i*, p*o*], which assigns a port of t to each port of t .

## **3.6 Example: An Initial Pattern Hierarchy**

In the following, we demonstrate the FACTum approach by specifying variants of three well-known patterns: the singleton pattern, the publisher subscriber pattern, and the blackboard pattern. Thereby, the publisher component of the publisher subscriber pattern is modeled as an instance of the singleton, whereas the blackboard pattern is specified by instantiating the publisher subscriber pattern.

**Singleton.** The singleton pattern is a pattern for dynamic architectures in which, for a certain type of component, it is desired to have only one active instance at all points in time. Figure 2 depicts a possible specification of the pattern in terms of a configuration diagram and a corresponding activation specification. Since the pattern is only concerned with activation of components, we do neither have data types, nor port specifications for that pattern.

*Interfaces.* The interface is specified by the configuration diagram in Fig. 2a: It consists of a single interface *Singleton* and does not require any special ports.

*Architectural Assertions.* Activation assertions are formalized by the specification depicted in Fig. 2b: With Eq. 1 we require that there exists a component c which is always activated and with Eq. 2 we require the component to be unique. In our version of the singleton, we require that the singleton component is not allowed to change over time. This is why variable c is declared to be rigid in Fig. 2b. Indeed, other versions of the singleton are possible in which the singleton may change over time.

**Publisher Subscriber.** We now proceed by specifying a version of the publisher subscriber pattern. Such patterns are used for architectures in which socalled subscriber components can subscribe for certain messages from other, so-called publisher components. Figure 3 depicts a possible specification of the pattern in terms of a data type specification, port specification, and corresponding configuration diagram.

*Data Types.* In a publisher subscriber pattern we usually have two types of messages: subscriptions and unsubscriptions. Figure 3b depicts the corresponding data type specification. Subscriptions are modeled as *parametric* data types over two type parameters: a type id for component identifiers and some type evt denoting events to subscribe for. The data type is freely generated by the constructor terms "sub id evt" and "unsub id evt", meaning that every element of the type has the form "sub id evt" or "sub id evt".

*Ports.* Two port types are specified over these data types by the specification given in Fig. 3c: a type sb which allows to exchange subscriptions to a specific event and type nt which allows to exchange messages associated to any event.

*Interfaces.* The configuration diagram depicted in Fig. 3a depicts the specification of the interfaces of the two types of components: An interface *Publisher* is defined with an input port *sb* to receive subscriptions and an output port *nt* to send out notifications. Moreover, an interface *Subsciber* is defined with an input port *nt* receiving notifications and an output port *sb* to send out subscriptions. As stated in the beginning, we want a publisher to be unique and activated which is why it is specified as *Publisher* :*Singleton*, meaning that it is considered to be an instance of the *Singleton* type of the specification of the singleton pattern.

*Architectural Assertions.* Activation assertions for publisher subscriber architectures are mainly inherited from the singleton pattern: since a publisher is specified to be a singleton, a publisher component is unique and always activated. Moreover, two connection assertions for publisher subscriber architectures are specified in Fig. 4: Eq. (3) requires a publisher's input port sb to be connected to the corresponding output port of every active subscriber which sends some


**Fig. 4.** Architectural constraints for the blackboard pattern.

message. Equation (4), on the other hand, requires a subscriber's input port nt to be connected to the corresponding output port of the publisher, whenever the latter sends a message for which the subscriber is subscribed.

**Blackboard.** We conclude our example by specifying a dynamic version of the blackboard pattern. A blackboard architecture is usually used for the task of collaborative problem solving, i.e., a set of components work together to solve an overall, complex problem. Our specification of the pattern is depicted in Fig. 5 and consists of a data type specification, port specification, and corresponding configuration diagram.

*Data Types.* Blackboard architectures usually work with *problems* and *solutions* for them. Figure 5b provides a specification of the corresponding data types. We denote by PROB the set of all problems and by SOL the set of all solutions. Complex problems consist of *subproblems* which can be complex themselves. To solve a problem, its subproblems have to be solved first. Therefore, we assume the existence of a *subproblem relation* ≺ ⊆ PROB×PROB. For complex problems, the *details* of the relation may not be known in advance. Indeed, one of the benefits of a blackboard architecture is that a problem can be solved even without knowing the exact nature of this relation in advance. However, the subproblem relation has to be well-founded (Eq. (5)) for a problem to be solvable. In particular, we do not allow for cycles in the transitive closure of ≺. While there may be different approaches to solve a problem (i.e., several ways to split a problem into subproblems), we assume, without loss of generality, that the final solution for a problem is always unique. Thus, we assume the existence of a function *solve* : PROB → SOL which assigns the *correct* solution to each problem. Note, however, that it is not known in advance *how* to compute this function and it is indeed one of the reasons for using this pattern to calculate this function.



*op prob* : PROB

**Fig. 5.** Specification of the blackboard pattern.

*Ports.* In Fig. 5c, we specify 4 ports for the pattern:


Moreover a configuration parameter *prob* is specified to parametrize knowledge source according to the problems p ∈ PROB they can solve.

*Interfaces.* A blackboard pattern usually involves two types of components: blackboards and knowledge sources. The corresponding interfaces are specified by the configuration diagram in Fig. 5a. Since our version of the blackboard pattern is specified to be an instance of the publisher subscriber pattern, we import the corresponding pattern specification in the header of the diagram. We then specify two interfaces. The blackboard interface is denoted *BB* and is declared to be an instance of a *Publisher* component in a publisher subscriber pattern. It consists of two input ports *rp* and *ns* to receive required subproblems and new solutions. Moreover, it specifies two output ports *op* and *cs* to communicate currently open problems and solutions for all currently solved problems. Thereby, port *rp* is specified to be an instance of port sb of a publisher and port *cs* to be an instance of a publisher's nt port.

The interface for knowledge sources is denoted *KS* and is declared to be an instance of a *Subscriber* component in a publisher subscriber pattern. Note that each knowledge source can only solve certain problems, which is why a knowledge source is parameterized by a problem "*prob*". The specification of ports actually mirrors the corresponding specification of the blackboard interface. Thus, a knowledge source is required to have two input ports *op* and *cs* to


**Fig. 6.** Specification of behavior for blackboard components.

receive currently open problems and solutions for all currently solved problems, and two output ports *rp* and *ns* to communicate required subproblems and new solutions. Thereby, port *rp* is specified to be an instance of a subscribers nt port and port *cs* to be an instance of a subscribers sb port, respectively.

*Component Types.* A blackboard provides the *current state* towards solving the original problem and forwards problems and solutions from knowledge sources. Figure 6 provides a specification of the blackboard's behavior in terms of three behavior assertions:


Note that the last assertion (Eq. (8)) is formulated using a *weak* until operator which is defined as follows: γ W γ def = -(γ ) ∨ (γ U γ).

A knowledge source receives open problems via *op* and provides solutions for other problems via *cs*. It might contribute to the solution of the original problem by solving currently open subproblems. Figure 7 provides a specification of the knowledge sources's behavior in terms of four behavior assertions:



**Fig. 7.** Specification of behavior for knowledge source components.


**Fig. 8.** Specification of activation constraints for blackboard architectures.

*Architectural Assertions.* Activation constraint for blackboards are mainly inherited from the singleton pattern: since a blackboard is specified to be an instance of a publisher which is again an instance of a singleton, a blackboard component is unique and always activated. Activation constraint for knowledge sources are provided in Fig. 8 by Eq. (13): Whenever a knowledge source (able to solve a problem *pp*) gets notified about a request to solve *pp*, it stays active until *pp* is indeed solved. Connection assertions for the blackboard pattern are mainly inherited from the corresponding specification of the publisher subscriber pattern (for ports *rp* and *cs*, respectively). Two additional assertions, however, are provided in Fig. 8: with Eq. 14 we require input ports *op* of active blackboard components to be connected to the corresponding output ports of knowledge sources and with Eq. 15 we require a similar property for port *ns*.

## **4 Verifying Architectural Design Patterns**

In the last section we presented FACTum, a methodology and corresponding techniques to specify architectural design pattern. Thereby, we relied on an intuitive understanding of the semantics of the techniques. In the following, we first provide a more formal definition of the semantics of a FACTum specification. Then, we describe an algorithm to map a given specification to a corresponding Isabelle/HOL theory and we show soundness of the algorithm.

#### **4.1 Semantics of Pattern Specifications**

The semantics of a pattern specification is given in terms of sets of configuration traces introduced in Sect. 2.

**Definition 2 (Semantics of Pattern Specification).** *The semantics of a pattern specification* (*VAR*, *DS*,*IS*, *CT*, *AS*) *is given by a 5-tuple* (A,P, T , C, *AT*)*, consisting of: – an algebra* <sup>A</sup> <sup>=</sup> 


*such that for all port interpretations* δ : P → P *(injective mappings which respect tp and* T *), variable interpretations* ι: V → A *and* ι : V → A*, and component variable interpretations* κ: C → C *and* κ : C → C *(respecting interface types) the following conditions hold:*


#### **4.2 Mapping to Isabelle/HOL**

Algorithm 1 describes how to systematically transfer a pattern specification to a corresponding Isabelle/HOL theory. In general, the transformation is done in 4 main steps: (i) The specified data types are transferred to corresponding Isabelle/HOL data type specifications (ii) An Isabelle locale is created for the corresponding pattern which imports other locales for each instantiated pattern. (iii) Specifications of component behavior are added as assumptions. (iv) Activation and connection assertions are provided as assumptions.

The following soundness criterion guarantees that Algorithm 1 indeed preserves the semantics of a pattern specification.

**Theorem 1 (Soundness of Algorithm** 1**).** *For every pattern specification PT , and model* T *of the Isabelle/HOL locale (as specified in* [21]*) generated by Algorithm 1, there exists a* T *such that* T |= *PT (as defined by Definition* 2*) and* T *is isomorphic to* T*; and vice versa.*

Note that the generated theory is based on Isabelle/HOLs implementation of configuration traces [7]. Thus, a calculus is instantiated for each component type which provides a set of rules to reason about the specification of the behavior of components of that type.

**Algorithm 1.** Mapping a pattern specification to an Isabelle/HOL Theory. **Input:** (*VAR*, *DS*,*IS*, *CT*, *AS*) {pattern specification according to Definition 1} **Output:** An Isabelle/HOL theory for the specification 1: create Isabelle/HOL data type specification for *DS* 2: create Isabelle/HOL locale for the pattern 3: **for all** Interfaces <sup>i</sup> = (*CP*,*IP*, *OP*) <sup>∈</sup> *IF* **do** 4: **if** i instantiates a component of another pattern **then** 5: import the corresponding locale 6: create instance of ports according to δ*<sup>i</sup>* 7: **else** 8: import locale "dynamic component" of theory "Configuration Traces"[8] 9: **end if** 10: create instance of locale parameters *tCMP* and *active* 11: **for all** configuration parameters <sup>p</sup> <sup>∈</sup> *CP* which are not instances **do** 12: create locale parameter p of type *tp*(p) 13: create locale assumption "∀x. ∃c. x = p(c)" 14: **end for** 15: **for all** ports <sup>p</sup> <sup>∈</sup> *IP* <sup>∪</sup> *OP* which are not instances **do** 16: create locale parameter p of type *tp*(p) 17: **end for** 18: **for all** behavior assertions <sup>b</sup> <sup>∈</sup> *CT<sup>i</sup>* **do** 19: create locale assumption for b using def. of theory "Configuration Traces"[8] 20: **end for** 21: **end for** 22: **for all** activation/connection assertions *<sup>c</sup>* <sup>∈</sup> *AS* **do** 23: create locale assertion for c 24: **end for**

#### **4.3 Example: Pattern Hierarchy**

Algorithm 1 can be used to transfer a given pattern specification to a corresponding Isabelle/HOL theory where it is subject to formal verification. This is demonstrated by applying it to the specification of the singleton, publisher subscriber, and blackboard pattern presented in Sect. 3.6. To demonstrate the verification capabilities, we then proof one characteristic property for each pattern. The corresponding Isabelle/HOL theory files are provided online [22].

**Singleton.** We first come up with a basic property for singleton components which ensures that there exists indeed a *unique* component of the corresponding type which is always activated:

$$\exists!c \colon \Box\left(\left\|\left|c\right|\right).\tag{16}$$

**Publisher Subscriber.** Lets now turn to the publisher subscriber pattern. First of all, remember that the publisher component was specified to be an instance of the singleton pattern which is why all results from the verification of the singleton pattern are lifted to the publisher component. Thus, we get an equivalent result as Eq. (16) for free. Moreover, we can use the additional assertions imposed by the specification to come up with another property for the publisher subscriber pattern which guarantees that a subscriber indeed receives all the messages for which he is subscribed: 


Note that the proof of the above property is based on Eq. (16) inherited from the singleton pattern. Indeed, the hierarchical nature of FACTum allows for reuse of verification results from instantiated patterns.

**Blackboard.** Again, the properties verified for singletons (Eq. (16)) as well as the properties verified for publisher subscriber architectures (Eq. (17)) are inherited for the blackboard specification. In the following, we use these properties to verify another property for blackboard architectures: A blackboard pattern guarantees that if for each open (sub-)problem, there exists a knowledge source which is able to solve the corresponding problem: 

$$
\Box \left( \forall p' \in bb'.op : \Diamond \left( ||ks\_{p'}|| \right) \right), \tag{18}
$$

then, it is guaranteed, that the architecture will eventually solve an overall problem, even if no single knowledge source is able to solve the problem on its own: .*cs*

$$
\Box \left( p' \in bb'.rp \longrightarrow \Diamond(p',solve(p')) \in bb'.cs\right).\tag{19}
$$

#### **5 Related Work**

Related work can be found in three different areas.

*Formal Specification of Architectural Styles.* Over the last years, several approaches emerged to support the formal specification of architectural design patterns. One of the first attempts in this direction was Wright [23] which provided the possibility to specify architectural styles which is similar to our notion of architectural design pattern. More recent approaches to specify styles are based on the BIP framework [24] and provide logics [25] as well as graphical notation [26] to specify styles. There are, however, two differences of these approaches to the work presented in this paper: One difference concerns the expressive power of the specification techniques. While the above approaches focus mainly on the specification of patterns for static architectures, we allow for the specification of static as well as dynamic architectures. Another difference arises from the scope of the work. While the above approaches focus mainly on the specification of patterns, our focus is more on the verification of such specifications.

*Verification of Architectural Styles and Patterns.* Recently, some approaches emerged which focus on the verification of architectural styles and patterns. Kim and Garlan [27], for example, apply the Alloy [28] analyzer to automatically verify architectural styles specified in ACME [29]. A similar approach comes from Wong et al. [30] which applies Alloy to the verification of architectural models. Zhang et al. [31] applied model checking techniques to verify architectural styles formulated in Wright#, an extension of Wright. Similarly, Marmsoler and Degenhardt [32] also apply model checking for the verification of design patterns. Another approach comes from Wirsing et al. [33] where the authors apply rewriting logic to specify and verify cloud-based architectures. While all these approaches focus on the verification of architectures and architectural patterns, they all apply automatic verification techniques. While this has many advantages, verification is limited to properties subject to automatic verification. Indeed, with our work we actually complement these approaches by providing an alternative approach based on, rather than automatic verification techniques.

*Interactive Theorem Proving for Software Architectures.* Another area of related work can be found in applications of to software architectures in general. Fensel and Schnogge [34], for example, apply the KIV interactive theorem prover to verify concrete architectures in the area of knowledge-based systems. Their work differs from our work in two main aspects. (i) While they focus on the verification of concrete architectures, we propose an approach to verify architectural patterns. (ii) While they focus on the verification of static architecture, our approach allows for the verification of dynamic architectures. Thus, we complement their work by providing a more general approach. More recently, some attempts were made to apply to the verification of architectural connectors. Li and Sun [35], for example, apply the Coq proof assistant to verify connectors specified in Reo [36]. With our work we complement their approach since we focus on the verification of patterns, rather than connectors.

To summarize, to the best of our knowledge, this is the first attempt applying to the verification of architectural design patterns.

## **6 Conclusion**

With this paper we presented a novel approach for the specification and verification of architecture design patterns. Therefore, we provide a methodology and corresponding specification techniques for the specification of patterns in terms of configuration traces. Then, we describe an algorithm to map a given specification to a corresponding Isabelle/HOL theory and show soundness of the algorithm. Our approach can be used to formally specify patterns in a hierarchical way. Using the algorithm, the specification can then be mapped to a corresponding Isabelle/HOL theory where the pattern can be verified using a pre-existing calculus. This is demonstrated by specifying and verifying versions of three architecture patterns: the singleton, the publisher subscriber, and the blackboard. Thereby, patterns were specified hierarchical and verification results for lower level patterns were reused for the verification of higher level patterns.


The proposed approach addresses the challenges for pattern verification identified in the introduction as follows:

In order to achieve our overall vision of interactive, hierarchical pattern verification [37], future work is needed in two directions: We are currently working on an implementation of the approach for the eclipse modeling framework [38] where a pattern can be specified and a corresponding Isabelle/HOL theory can be generated using the algorithm presented in the paper. In a second step, we want to lift the verification to the architecture level, hiding the complexity of an interactive theorem prover and interpreting its output at the architecture level.

**Acknowledgments.** We would like to thank Veronika Bauer, Maximilian Junker, and all the anonymous reviewers of FASE 2018 for their comments and helpful suggestions on earlier versions of this paper. Parts of the work on which we report in this paper was funded by the German Federal Ministry of Education and Research (BMBF) under grant no. 01Is16043A.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Supporting Verification-Driven Incremental Distributed Design of Components**

Claudio Menghi1(B) , Paola Spoletini<sup>2</sup> , Marsha Chechik<sup>3</sup> , and Carlo Ghezzi<sup>4</sup>

<sup>1</sup> Chalmers *<sup>|</sup>* University of Gothenburg, Gothenburg, Sweden claudio.menghi@gu.se <sup>2</sup> Kennesaw State University, Marietta, USA pspoleti@kennesaw.edu <sup>3</sup> University of Toronto, Toronto, Canada chechik@cs.toronto.edu <sup>4</sup> Politecnico di Milano, Milan, Italy carlo.ghezzi@polimi.it

**Abstract.** Software systems are usually formed by multiple components which interact with one another. In large systems, components themselves can be complex systems that need to be decomposed into multiple sub-components. Hence, system design must follow a systematic approach, based on a recursive decomposition strategy. This paper proposes a comprehensive verification-driven framework which provides support for designers during development. The framework supports hierarchical decomposition of components into sub-components through formal specification in terms of pre- and post-conditions as well as independent development, reuse and verification of sub-components.

## **1 Introduction**

Software is usually not a monolithic product: it is often comprised of multiple components that interact with each other to provide the desired functionality. Components themselves can be complex, requiring their own decomposition into sub-components. Hence, system design, must follow a systematic approach, based on a recursive decomposition strategy that yields a modular structure. A good decomposition and a careful specification should allow components and sub-components to be developed in isolation by different development teams, delegated to third parties [32], or reused off-the-shelf.

In this context, guaranteeing correctness of the system under development becomes particularly challenging because of the intrinsic tension between two main requirements. On the one hand, to handle complexity, we need to enable development of sub-components where only a partial view of the system is available [28]. On the other hand, we must ensure that independently developed and verified (sub-)components can be composed to guarantee global correctness of **The p&d running example.** The p&d system supports furniture purchase and delivery. It uses two existing web services, which implement furniture-sale and delivery, as well as a component that implements the user interface. These are modeled by the labeled transition systems shown in Fig. 1a-1c. The p&d component under design is responsible for interaction with these components, which form its execution environment. The overall system must ensure satisfaction of the properties informally described in Fig. 1d.

(d) Properties of the p&d system.

**Fig. 1.** The p&d running example.

the resulting system. Thus, we believe that component development should be supported by a process that (1) is intrinsically iterative; (2) supports decentralized development; and (3) guarantees correctness at each development stage.

The need for supporting incremental development of components has been widely recognized. Some approaches [15,37] synthesize a partial model of components from properties and scenarios and facilitate an iterative development of this model through refinement. Others [7,8,10,26,27] provide support for checking and refining partial models, with the goal of preserving correctness when such systems get refined. However, while these techniques guarantee correctness at each development stage, they do not address the problem of decentralized development.

In this paper, we describe a unified framework called FIDDle (a Framework for Iterative and Distributed Design of components) which supports decentralized top-down development. FIDDle supports a formal specification of global properties, a decomposition process and specification of component interfaces by providing a set of tools to guarantee correctness of the different artifacts produced during the process. The main contribution of the paper is a method for supporting an iterative and distributed verification-driven component development process through a coherent set of tools. Specific novel contributions are (1) a new formalism, called *Interface Partial Labelled Transition System (IPLTS)*, for specifying components through a decomposition that encapsulates sub-components into unspecified black-box states; (2) an approach to specify *the expected behavior of black-box states* via pre- and post-conditions expressed in Fluent Linear Time Temporal Logic; and (3) a notion of *component correctness*

**Fig. 2.** Overview of the application of FIDDle for developing a component. Thickbordered components are implemented in FIDDle. Thick-dashed bordered components are currently supported by the theory presented in this paper, but they are still not fully implemented. Thin-dashed bordered components are not discussed in this work.

and a *local verification procedure* that *guarantees preservation of global properties* once the components are composed.

We illustrate FIDDle using a simple example: the *purchase*&*delivery* (p&d) example [14,29] – see Fig. 1. We evaluate FIDDle on a realistic case study obtained by reverse-engineering the executive module of the Mars Rover developed at NASA [12,17,18]. Scalability is evaluated by considering randomlygenerated examples.

**Organization.** Sect. 2 provides an overview of FIDDle. Section 3 gives the necessary background. Section 4 presents Interface Partial Labelled Transition Systems (IPLTS). Section 5 defines a set of algorithms for reasoning on partial components and describes their implementation. Section 6 reports on an evaluation of the proposed approach. Section 7 compares FIDDle with related approaches, and Sect. 8 concludes. Proofs for the theorems in the paper can be found in the Appendix available at http://ksuweb.kennesaw.edu/∼pspoleti/ fase-appendix.pdf; source code and video of the tool and a complete replication package can be found at https://github.com/claudiomenghi/FIDDLE.

## **2 Overview**

FIDDle is a verification-driven environment supporting incremental and distributed component development. A high-level view of FIDDle is shown in Fig. 2. FIDDle allows incrementally developing a component through a set of development phases in which the human insight and experience are exploited (rounded boxes labeled with a designer icon or a recycle symbol, to indicate design or reuse, respectively) and phases in which automated support is provided (squared boxes labeled with a pair of gearwheels). Automatic support allows verifying the current state of the design, synthesizing parts of the partial component, or checking whether the designed sub-component can correctly fit into the original design. FIDDle development phases are described below.

**Creating an Initial Component Design.** This phase is identified in Fig. 2 with the symbol 1 . The development team formalizes the properties that this component has to guarantee and designs an initial, high-level structure of the component. Designers also formulate properties that the component needs to ensure. The initial component design is created using a state-based formalism that can clearly identify parts (called "sub-components" in this paper), represented as *black-box* states, whose internal design is delayed to a later stage or split apart for distributed development by other parties. In the following, we refer to other states as "regular". Black-box states are enriched with an *interface* that provides information on the universe of events relevant to the black-box. They are also decorated with via pre- and post-conditions that allow distributed teams to develop sub-components without the need to know about the rest of the system. The *contract* of a black box state consists of its interface and preand post-conditions.

In the p&d example, the environment (assumed as given) in which the p&d component will be deployed is composed by the furniture-sale component (Fig. 1a), the shipping component (Fig. 1b) and the user (Fig. 1c). A possible initial design for the p&d component is shown in Fig. 3c. It contains the regular states 1 and 3 and black-box states 2 and 4. The initial state is state 1. Whenever a *userReq* event is detected, the component moves from the initial state 1 into the black-box state 2, which represents a sub-component in charge of managing the user request. An event *offerRcvd* which indicates that an offer is provided to the user labels the transition to state 3. The pre- and post- conditions for black-box states 2 and 4 are shown in Fig. 3b. Events *prodInfoReq*, *infoRcvd*, *shipInfoReq* and *costAndTime* can occur while the component is in the blackbox state 2. The pre-condition requires that there is a user request that has not yet been handled, while the post-condition ensures that the furniture-sale and the shipping services provided info on the product and on delivery cost and time. FIDDle supports the developer in checking properties of the initial component design.

The *realizability checker* confirms the existence of an integration that completes the partially specified component and ensures the satisfaction of the properties of interest. If such a component does not exist, the designer needs to redesign the partially-specified component. The *well-formedness checker* verifies that both the pre- and the post-conditions of black-box states are satisfiable. Finally, the *model checker* verifies whether the (partial) component (together with its contract) guarantees satisfaction of the properties of interest.

In the p&d example, the model checker identifies a problem with the partial solution sketched in Fig. 3c. No matter how the black-box state 2 is to be defined, the p&d component cannot satisfy property *P4* since every time *reqCanc* occurs

(g) Integration of the sub-component of Fig. 3e and the component of Fig. 3d.

**Fig. 3.** The p&d running example: artifacts produced by FIDDle.

it is preceded by *usrAck*. This suggests a re-design of the p&d component, which may lead to a new model, shown in Fig. 3d. This model includes two regular states: state 1, in which the component waits for a new user request, and state 3, in which the component has provided the user with an offer and is waiting for an answer. The user might accept (*userAck*) or reject (*userNack*) an offer and, depending on this choice, either state 4 or 5 is entered. States 2, 4 and 5 are black-box states, to be refined later. The designer also provides pre- and post-conditions for the black-box states. Pre- and post-conditions of the blackbox state 2 specify that there is a pending user request, and that cost, time and product information are collected. Pre- and post-conditions of the black-box state 4 specify that *infoRcvd* has occurred after the user request, and both a product and shipping requests are performed. Finally, pre- and post-conditions of the black-box state 5 specify that *infoRcvd* has occurred after the user request and before entering the state, and both the product and the shipping requests are cancelled when leaving the state. This model is checked using the provided tools; since it passes all the checks, it can be used in the next phase of the development.

The design team may choose to refine the component or *distribute* the development of unspecified sub-components (represented by black box states) to other (internal or external) development teams. In both cases, the sub-component can be designed by only considering the contract of the corresponding black-box state. Each team can develop the assigned sub-component or reuse existing components.

**Sub-component Development.** This phase is identified in Fig. 2 with the symbol 2 . Each team can design the assigned sub-component using any available technique, including manual design (left side), reusing of existing subcomponents (right side) or synthesizing new ones from the provided specifications (center). The only constraints are (1) given the stated pre-condition, the sub-component has to satisfy its post-condition, and (2) the sub-component should operate in the same environment as the overall partially specified component. Sub-component development can itself be an iterative process, but neither the model of the environment nor the overall properties of the system can be changed during this process. Otherwise, the resulting sub-component cannot be automatically integrated into the overall system.

In the p&d example, development of the sub-component for the black-box state 2 is delegated to an external contractor. Candidate sub-components are shown in Fig. 3e–f. In the former case, the component requests shipping info details and waits until the shipping service provides the shipment cost and time. Then it queries the furniture-sale service to obtain the product info. In the latter case, the shipping and the furniture services are queried, but the sub-component does not wait for an answer from the furniture-sale. Since these candidates are fully defined, the well-formedness check is not needed. Yet, the *substitutability checking* confirms that of these, only the sub-component in Fig. 3e satisfies the post-condition in Fig. 3b.

**Sub-component Integration.** This phase is identified in Fig. 2 with the symbol 3 . FIDDle guarantees that if each sub-component is developed correctly w.r.t. the contract of the corresponding black-box state, the component obtained by integrating the sub-components is also correct. In the p&d example, the subcomponent in Fig. 3e passes the substitutability check and can be a valid implementation of the black-box state 2 in Fig. 3d. Their integration is showed in Fig. 3g.

## **3 Preliminaries**

The model of the environment and the properties of interest are expressed using Labelled Transition Systems and Fluent Linear Time Temporal Logic.

**Model of the Environment.** Let Act be the universal set of observable events and let Act<sup>τ</sup> <sup>=</sup> Act∪{τ}, where <sup>τ</sup> denotes an unobservable local event. A *Labeled Transition System (LTS)* [20] is a tuple <sup>A</sup> <sup>=</sup> Q, q0, αA, Δ, where <sup>Q</sup> is the set of states, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, αA <sup>⊆</sup> Act is a finite set of events, and <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> αA ∪ {τ} × <sup>Q</sup> is the transition relation. The parallel composition operation is defined as usual (see for example [14]).

**Properties.** A fluent [33] *Fl* is a tuple IF l, TF l,*InitFl*, where <sup>I</sup>F l <sup>⊂</sup> Act, <sup>T</sup>F l <sup>⊂</sup> Act, <sup>I</sup>F l <sup>∩</sup> <sup>T</sup>F l <sup>=</sup> <sup>∅</sup> and *InitFl* ∈ {*true*, *false*}. A fluent may be *true* or *false*. A fluent is *true* if it has been initialized by an event <sup>i</sup> <sup>∈</sup> <sup>I</sup>F l at an earlier time point (or if it was initially *true*, that is, *InitFl* = *true*) and has not yet been terminated by another event <sup>t</sup> <sup>∈</sup> <sup>T</sup>F l; otherwise, it is *false*. For example, consider the LTS in Fig. 1c and the fluent *<sup>F</sup> ReqPend*={*userReq*}, {*respOk, reqCanc*}, *false* . *F ReqPend* holds in a trace of the LTS from the moment at which *userReq* occurs and until a transition labeled with *respOk* or *reqCanc* is fired. In the following, we use the notation F *Event* to indicate a fluent that is *true* when the event with label *event* occurs.

An FLTL formula is obtained by composing fluents with standard LTL operators: (next), (eventually), (always), U (until) and (weak until). For example, FLTL encodings of the properties *P1*, *P2*, *P3* and *P4* are shown in Fig. 3a.

Satisfaction of FLTL formulae can be evaluated over *finite* and *infinite* traces, by first constructing and FLTL interpretation of the infinite and finite trace and then by evaluating the FLTL formulae over this interpretation The FLTL interpretation of a finite trace is obtained by slightly changing the interpretation of infinite traces. The evaluation of the FLTL formulae on the finite trace is obtained by considering the standard interpretation of LTL operator over finite traces (see [13]). In the following, we assume that Definitions 5 and 4 (available in the Appendix) are considered to evaluate whether an FLTL formula is satisfied on finite and infinite traces, respectively.

## **4 Modeling and Refining Components**

This section introduces a novel formalism for modeling and refining components. We define the notion of a partial LTS and then extend it with pre- and postconditions.

**Partial LTS.** A *partial LTS* is an LTS where some states are "regular" and others are "black-box". Black-box states model portions of the component whose behavior still has to be specified. Each black-box state is augmented with an interface that specifies the universe of events that can occur in the black-box. A *Partial LTS (PLTS)* is a structure <sup>P</sup> <sup>=</sup> A, R, B, σ, where: <sup>A</sup> <sup>=</sup> Q, q0, αA, Δ is an LTS; <sup>Q</sup> is the set of states, s.t. <sup>Q</sup> <sup>=</sup> <sup>R</sup> <sup>∪</sup> <sup>B</sup> and <sup>R</sup> <sup>∩</sup> <sup>B</sup> <sup>=</sup> <sup>∅</sup>; <sup>R</sup> is the set of *regular* states; <sup>B</sup> is the set of *black-box* states; <sup>σ</sup> : <sup>B</sup> <sup>→</sup> <sup>2</sup>αA is the *interface*. An LTS is a PLTS where the set of black-box states is empty. The PLTS in Fig. 3d is defined over the regular states 1 and 3, and the black-box states 2, 4 and 5. The interface specifies that events *prodInfoReq*, *infoRcvd*, *shipInfoReq* and *costAndTime* can occur in the black-box state 2.

**Definition 1.** *Given a PLTS* <sup>P</sup> <sup>=</sup> A, R, B, σ *defined over the LTS* <sup>A</sup> <sup>=</sup> QA, q<sup>A</sup> <sup>0</sup> , αA, ΔA *and an LTS* <sup>D</sup> <sup>=</sup> QD, q<sup>D</sup> <sup>0</sup> , αD, ΔD*, the parallel composition* <sup>P</sup> <sup>D</sup> *is an LTS* <sup>S</sup> <sup>=</sup> QS, q<sup>S</sup> <sup>0</sup> , αS, ΔS *such that* <sup>Q</sup><sup>S</sup> <sup>=</sup> <sup>Q</sup><sup>A</sup> <sup>×</sup>QD*;* <sup>q</sup><sup>S</sup> <sup>0</sup> = (q<sup>A</sup> <sup>0</sup> , q<sup>D</sup> <sup>0</sup> )*;* αS <sup>=</sup> αA <sup>∪</sup> αD*; and the set of transitions* <sup>Δ</sup><sup>S</sup> *is defined as follows:*

*–* (s,l,s- )∈Δ*<sup>A</sup>* (s,t,l,s-,t)∈Δ*<sup>S</sup>* , *and* <sup>l</sup> <sup>∈</sup> αA \ αD or l <sup>=</sup> <sup>τ</sup> *; –* (t,l,t- )∈Δ*<sup>D</sup>* (s,t,l,s,t-)∈Δ*<sup>S</sup>* , *and one of the following is satisfied: (1)* <sup>l</sup> <sup>∈</sup> αD \ αA*, (2)* <sup>l</sup> <sup>=</sup> <sup>τ</sup> *, or (3)* (<sup>s</sup> <sup>∈</sup> B and l <sup>∈</sup> <sup>σ</sup>(s))*; –* (s,l,s- )∈Δ*A*,(t,l,t- )∈Δ*<sup>D</sup>* (s,t,l,s-,t-)∈Δ*<sup>S</sup> and* <sup>l</sup> <sup>∈</sup> αA <sup>∩</sup> αD, l <sup>=</sup> τ.

Given <sup>P</sup>, <sup>A</sup>, <sup>D</sup> defined above, the system <sup>S</sup> <sup>=</sup> <sup>P</sup> <sup>D</sup> and a state <sup>q</sup> of P, we say that a finite trace l0, l1,...l<sup>n</sup> of S *reaches* q if there exists a sequence s0, t0, l0,s1, t1,...ln,q, tn+1, where for every 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we have (si, ti, li,si+1, ti+1) <sup>∈</sup> <sup>Δ</sup><sup>S</sup>. For example, considering the PLTS in Fig. 3d and the LTS in Fig. 1c, the finite trace obtained by performing a *userReq* event reaches the black-box state 2 of the PLTS.

Given a finite trace π = l0, l1,...l<sup>n</sup> (or an infinite trace l0, l1,...) of S, we say that its sub-trace li, li+1 ...l<sup>k</sup> is *inside* the black-box state b if one of the sub-sequences associated with <sup>π</sup> is in the form b, ti, li,b, ti+1,...,lk,b, tk, where <sup>l</sup>i, li+1,...,l<sup>k</sup> <sup>∈</sup> <sup>σ</sup>(b). Note that a sub-trace is a *finite* trace. For example, considering the parallel composition of the PLTS in Fig. 3d and the LTSs in Fig. 1c and b, and the finite trace associated with events *userReq, shipInfoReq, offerRcvd*, the sub-trace associated with *shipInfoReq* is inside the black-box state 2. This means that *shipInfoReq* must occur in the sub-component replacing the black-box state 2.

**Adding Pre- and Post-conditions.** The intended behavior of a subcomponent refining a black-box state can be captured using pre- and postconditions. The *contract* for the sub-component associated with a box consists of the box interface and its pre- and post-conditions. Given the universal set *FLTL* of the FLTL formulae, an *Interface PLTS* (IPLTS) I is a structure A, R, B, σ, *pre*, *post*, where A, R, B, σ is a PLTS, *pre* : <sup>B</sup> <sup>→</sup> *FLTL* and *post* : <sup>B</sup> <sup>→</sup> *FLTL*.

For each black-box state b, the function *pre* specifies a constraint that must be satisfied by all *finite* traces of P that reach b. For example, the FLTL-expressed pre-condition for the black-box state 4 of the IPLTS in Fig. 3d requires that any trace of the composition between the IPLTS and an LTS that reaches the black-box state 4 provides info on the product to the user after his/her request.

For each black-box state b, the function *post* specifies *a post-condition* that constrains the behavior of the system in any sub-trace performed inside b. For example, the post-condition of the black-box state 4 of the IPLTS in Fig. 3d ensures that whenever this IPLTS is composed with an LTS, a product request and a shipping request are performed by the furniture-sale service while the system is inside the black-box state.

Given an IPLTS I and an LTS D, the *parallel composition* S between I and D is obtained by considering the PLTS P associated with I and the LTS D as specified in Definition 1. Given an IPLTS I, an LTS D and the *parallel composition* S between I and D, trace π of S is *valid* iff it is infinite and for every black-box state b, the post-condition post(b) holds in any sub-trace of π performed inside b.

**Definition 2.** *Given an LTS* D*, an IPLTS* I *is* well-formed *(over* D*) iff every valid trace of* <sup>S</sup> <sup>=</sup> <sup>I</sup> <sup>D</sup> *satisfies all the pre-conditions of black-box states of* <sup>I</sup>*.*

We say that <sup>S</sup> <sup>=</sup> <sup>I</sup> <sup>D</sup> *satisfies* an FLTL property <sup>φ</sup> if and only if <sup>φ</sup> is satisfied by every valid trace of S. In the p&d example, the post-condition (*F ProdReq*) ∧ (*F ShipReq*) of the black-box 4 ensures that the parallel composition of the component in Fig. 3d and its environment satisfies *P3*.

**Sub-components and Their Integration.** Integration aims to replace blackbox states of a given IPLTS with the corresponding sub-components. Given an IPLTS I, one of its black-box states b and its interface σ(b), *a sub-component for* b is an IPLTS R defined over the set of events σ(b). One state q<sup>R</sup> <sup>f</sup> of <sup>R</sup> is defined as the *final state* of R. Given a sub-component R, an LTS of its environment E, and a trace in the form πi; π<sup>e</sup> such that π<sup>i</sup> = l0, l<sup>1</sup> ...l<sup>n</sup> and π<sup>e</sup> = ln+1, ln+2,...lk, we say that πi; π<sup>e</sup> is *a trace of the parallel composition between* R *and* E if and only if (1) there exists a sequence q0, l0, q1, l<sup>1</sup> ...ln, q<sup>n</sup> in the environment such that for all <sup>i</sup>, where 0 <sup>≤</sup> i<n, (qi, li, qi+1) is a transition of <sup>E</sup>; (2) <sup>π</sup><sup>e</sup> is obtained by <sup>R</sup> <sup>E</sup> considering <sup>q</sup><sup>n</sup> as the initial state for the environment, (3) π<sup>e</sup> reaches q<sup>R</sup> <sup>f</sup> . A sub-component is *valid* if it ensures that the traces of the parallel composition satisfy its post-conditions. Intuitively, a trace of the parallel composition between a sub-component R and the environment E is obtained by concatenating two sub-traces: π<sup>i</sup> and πe. The sub-trace π<sup>i</sup> corresponds to a set of transitions performed by the environment before the sub-component is activated, while π<sup>e</sup> is a trace the system generates while it is in the sub-component R.

**Definition 3.** *Given an IPLTS* I *with a black-box state* b*, the environment* E *and a sub-component* R *for* b*,* R *is a* substitutable sub-component *iff every trace* πi; π<sup>e</sup> *of the parallel composition between* R *and* E *is such that if* π<sup>i</sup> *satisfies pre(b) then* π<sup>e</sup> *guarantees post(b).*

Intuitively, whenever the sub-component is entered and the pre-condition *pre(b)* is satisfied (i.e., the trace π<sup>i</sup> satisfies *pre(b)*), then a trace of the parallel composition between the sub-component and the environment that reaches the final state of the sub-component must satisfy the post-condition *post(b)*.

A black-box state of an IPLTS C can be replaced by a substitutable subcomponent R though an integration procedure. The resulting IPLTS C is called *integration*. Intuitively, the integration procedure connects every incoming and outgoing transition of the considered black-box state to the initial and final state of the substitutable sub-component R, respectively. Integrating the subcomponent R for black-box state 2 in Fig. 3e into the component in Fig. 3d produces the IPLTS in Fig. 3g. The prefix "2." is used to identify the states obtained from R. The contracts of black-box states 4 and 5 are the same as those in Fig. 3b.

**Theorem 1.** *Given a well-formed IPLTS* C *and a substitutable sub-component* R *for a black-box state* b *of* C*, if* C *satisfies an FLTL property* φ*, then the integration* C *obtained by substituting* b *with* R *also satisfies* φ*.*

The sub-component R from Fig. 3e is substitutable; thus, integrating it into the partial component C shown in Fig. 3g ensures that the resulting integrated component C preserves properties *P1* -*P4*.

## **5 Verification Algorithms**

In this section, we describe the algorithms for the analysis of partial components, which we have implemented on top of LTSA [25].

**Checking Realizability.** Realizability of a property φ is checked via the following procedure. Let E be the environment of the partial component C, and C<sup>B</sup> be the LTS resulting from removing all black-box states and their incoming and outgoing transitions from <sup>C</sup>. Check <sup>C</sup><sup>B</sup> <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>. If <sup>φ</sup> is not satisfied, the component is not realizable: no matter how the black-box states are specified, there will be a behavior of the system that does not satisfy φ. Otherwise, compute <sup>C</sup> <sup>E</sup> (as specified in Definition 1) and model-check it against <sup>¬</sup>φ. If the property <sup>¬</sup><sup>φ</sup> is satisfied, the component is not realizable. Indeed, all the behaviors of <sup>C</sup> <sup>E</sup> satisfy <sup>¬</sup>φ, i.e., there is no behavior that the component can exhibit to satisfy φ. Otherwise, the component may be realizable. For example, the realizability checker shows that it is possible to realize a component refining the one shown in Fig. 3c while satisfying property *P2*. Specifically, it returns a trace that ensures that after a *userReq* event, the offer is provided to the user (the event *offerRcvd*) only if the furniture service has confirmed the availability of the requested product (the event *inforRcvd*).

**Theorem 2.** *Given a component specified using an IPLTS* C*, its environment* E*, and a property of interest* φ*, the realizability checker returns "not realizable" if there is no component* C *obtained from* C *by integrating sub-components, s.t.* (C <sup>E</sup>) <sup>|</sup><sup>=</sup> <sup>φ</sup>*.*

**Checking Well-Formedness.** Given a partial component C with a black-box state b annotated with a pre-condition *pre(b)* and its environment E, the wellformedness checks whether *pre(b)* is satisfied in C as follows.

(1) *Transform post-conditions into LTSs*. Transform every FLTL post-condition post(bi) of every black-box state b<sup>i</sup> of C, including b, into an FLTL formula post(bi) as specified in [13]. This transformation ensures that the *infinite* traces that satisfy post(bi) have the form π, {*end*}<sup>ω</sup>, where <sup>π</sup> satisfies post(bi). For each black-box state bi, the corresponding post-condition post(bi) is transformed into an equivalent LTS, called LT Sb*<sup>i</sup>* , using the procedure in [37]. Since LT Sb*<sup>i</sup>* has traces in the form π, {*end*}ω, it has a state s with an *end*-labelled self-loop. This self-loop is removed, and s is considered as final state of LT Sb*<sup>i</sup>* . All other *end*-labeled transitions are replaced by τ -transitions. Each automaton LT Sb*<sup>i</sup>* contains all the traces that do not violate the corresponding post-condition.


In the p&d example, if we remove the clause F *InfoRcvd* from the postcondition of the black-box state 2, the p&d component is not well-formed since the pre-condition of state 4 is violated. The counterexample shows a trace that reaches the black-box state 4 in which an event *userReq* is not followed by *infoRcvd*. Adding F *InfoRcvd* to the post-condition of state 2 solves the problem.

**Theorem 3.** *Given a partial component* C *with a black-box state* b *annotated with a pre-condition pre(b) and its environment* E*, the well-formedness procedure returns true iff the valid traces of* C *satisfy the pre-condition pre(b).*

**Model Checking.** To check whether <sup>C</sup> <sup>E</sup> satisfies <sup>φ</sup>, we first construct an LTS C that generates only valid traces, by plugging into C the LTSs corresponding to all of its black-box states (as done in steps 1 and 2 of the well-formedness check) and use a classical FLTL model-checker to verify <sup>C</sup> <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>. If we consider the design of Fig. 3d and assume that the black-box state 2 is not associated with any post-condition, the model checker returns the counterexample *userReq*,τ , *offerRcvd* for property *P2*, since the sub-component that will replace the blackbox state 2 is not forced to ask to book the furniture service. Adding the postcondition in Fig. 3b solves the problem.

**Theorem 4.** *The model checking procedure returns* true *iff every valid trace of* <sup>C</sup> <sup>E</sup> *satisfies* <sup>φ</sup>*.*

**Checking Substitutability.** Given the environment E, a component C with a black-box state b and pre- and post-conditions *pre(b)* and *post(b)*, and a subcomponent R, this procedure checks whether R can be used in C in place of b. We first present a procedure assuming that R has no black-box states.


If R contains black-box states, checking R requires performing Steps (1) and (2) of the well-formedness check before running the substitutability procedure.

In the p&d example, the substitutability checker does not return any counterexample for the sub-component in Fig. 3e. Thus, the post-condition is satisfied and the sub-component can be integrated in place of the black-box state 2.

**Theorem 5.** *Let a component* C *with a black-box state* b*, its pre- and postconditions pre(b) and post(b), a sub-component* R*, and* C*'s environment* E *be given. The substitutability checker returns true, indicating that* R *can be used in* <sup>C</sup> *in place of* <sup>b</sup>*, iff for every trace* <sup>π</sup> <sup>=</sup> <sup>π</sup>i; <sup>π</sup><sup>e</sup> *of* <sup>R</sup> <sup>E</sup>*, if* <sup>π</sup><sup>i</sup> *is the finite prefix of* <sup>E</sup> *satisfying pre(b) and* <sup>π</sup><sup>e</sup> *is obtained by* <sup>R</sup> <sup>E</sup> *considering the final state of* π<sup>i</sup> *as the initial state of the environment, then* π<sup>e</sup> *satisfies post(b).*

## **6 Evaluation**

We aim to answer two questions: **RQ.1**: How effective is FIDDle w.r.t. supporting an iterative, distributed development of correct components? (Sect. 6.1) and **RQ.2**: How scalable is the automated part of the proposed approach? (Sect. 6.2).

#### **6.1 Assessing Effectiveness**

We simulated development of a complex component and analyzed FIDDleprovided support along the steps described in Sect. 2.

**Experimental Setup.** We chose the executive module of the K9 Mars Rover developed at NASA Ames [12,17,18] and specified using LTSs. The overall size of the LTS is <sup>∼</sup>10<sup>7</sup> states. The executive module was made by several components: *Executive*, *ExecCondChecker*, *ActionExecution* and *Database*. *ExecCondChecker* was further decomposed into *db-monitor* and *internal*. Each of these components was associated with a shared variable (*exec*, *conditionList*, *action* and *db*, respectively) used to communicate with the other components, e.g., the *exec* variable was used by *ExecCondChecker* to communicate with *Executive*. The access of each shared variable was regulated through a condition variable and a lock. The complete model of the *Executive* component comprised of 11 states, each further decomposed as an LTS. The final model of the *Executive* component was obtained by replacing these states with the corresponding LTSs. This model had about 100 states which is a realistic component of a medium size [5,6,24].

We considered two properties: (P*1* ): *Executive* performed an action only after a new plan was read from *Database*; (P*2* ): *Executive* got the lock over the *condList* variable only after obtaining the *exec* lock.

*Creating an Initial Component Design.* We considered the existing model (*D3* ) of the *Executive* and abstracted portions of the complete model into black-box states to create two partial components *D1* and *D2* representing partial designs. To generate *D2* we encapsulated three states that receive plans and prepare for plan execution into the black-box state *Read Plans*. To generate *D1*, we also set one of the 10 states of the *Executive* whose corresponding LTS is in charge of executing a plan, i.e., state *ExecuteTaskAction*, as a black-box state. By following this procedure, *D3* and *D2* can be obtained from *D2* and *D1*, respectively, by integrating the abstracted sub-components.

We considered the (partial) components *D1*, *D2* and *D3* and used FIDDle to iteratively develop and check their contracts. For *D1*, the steps were as follows: (1) The *realizability checker* confirmed the existence of a model that refined *D1* and satisfied the properties of interest. (2) The *model checker* returned a counterexample for both properties of interest. For P*1*, the model checker returned a counterexample in which no plan was read and yet an action was performed. For P*2*, the counterexample was where *Executive* got the *condList* lock without possessing the *exec* lock. To guarantee the satisfaction of P*1*, we specified a post-condition to the black-box state *Read Plans* that ensures that a plan was read. We also added a pre-condition requiring that an action was not under execution when the black-box state *Read Plans* was entered. (3) The *well-formedness checker* returned a counterexample trace that reached the blackbox state *Read Plans* while an action was under execution. (4) To ensure wellformedness, we added a postcondition to the black-box state *ExecuteTaskAction* ensuring that an action was not under execution when the system exited the black-box state. (5) The *model checker* confirmed that P*1* held. (6) To guarantee the satisfaction of P*2*, we added a post-condition to the black-box state *Read Plans* ensuring that when the control left the black-box, P*2* remained *true* and the *Executive* had the *exec* lock.

For design *D2*, the steps were as follows: (1) The *realizability checker* confirmed the existence of a model that refined *D2* and satisfied the properties of interest. (2) We ran the model checker that returned a counterexample for both properties of interest. (3) We added to the black-box state *Read Plans* the same pre- and post-conditions of as those developed for design *D1* and ran the *well-formedness* and the *model checker*. (4) The *well-formedness checker* confirmed that *D2* satisfied the pre-condition of the black-box *Read Plans*; the *model checker* certified the satisfaction of P*1* and P*2*.

Since the model of *Executive* was complete, we ran only the *model checker* to check *D3*. Properties P*1* and P*2* were satisfied.

*Sub-component Development.* We simulated a refinement process in which preand post-conditions were given to third parties for sub-component development. We considered the sub-components *SUB1* and *SUB2* containing the portion of the *Executive* component abstracted by the black-box states *ExecuteTaskAction* and *Read Plans*, respectively. We run the *substitutability checker* to verify, affirmatively, whether *SUB1* and *SUB2* ensured the post-condition of the black-box states *ExecuteTaskAction* and *Read Plans* given their pre-conditions.

*Sub-component Integration.* We then plugged in the designed sub-components into their corresponding black-box states. We integrated each sub-component into design *D1* and used the *model checker* to verify the resulting (partial) components w.r.t. properties P*1* -P*2*. The properties were satisfied, as intended.

**Results.** FIDDle was effective in analyzing partial components and helping change their design to ensure the satisfaction of the properties of interest. The experiment confirmed the possibility of distributing the design of subcomponents for the black-box states. As expected, no rework at the integration level was required, i.e., integration produced components that satisfied the properties of interest. This confirmed that FIDDle supports verification-driven iterative and distributed development of components.

**Threats to Validity.** A threat to construct validity concerns the (manual) construction of intermediate model produced during development by abstracting an existing component model and the design of the properties to be considered. However, the intermediate partial designs and the selected properties were based on original developer comments present in the model. A threat to internal validity concerns the design of the contracts (pre- and post- conditions and interfaces) for the black-box states chosen along the process. However, pre- and post- conditions were chosen and designed by consulting property specification patterns proposed in literature [16]. The fact that a single example has been considered is a threat to external validity. However, the considered example is a medium-size complex real case study [6,22,35].


**Table 1.** Results of experiments *E*1 and *E*2.

#### **6.2 Assessing Scalability**

We set up two experiments (*E1* and *E2* ) comparing performance of the *wellformedness* and the *substitutability checkers* w.r.t. classical model checking as the size of the partial components under development and their environments grew. Our experiments were based on a set of *randomly-generated* models.

*E1*. To evaluate the *well-formedness checker*, we generated an LTS model of the environment and a complete model for the component. We checked the parallel composition between the component and the environment w.r.t. a property of interest using a standard model checker. Then, we generated a partial component by marking one of the states of the complete component as a black-box, defining pre- and post- conditions for it and ran the well-formedness checker, comparing performance of the two.

*E2*. To evaluate the *substitutability checker*, we generated a complete component as in the previous experiment. Then, we extracted a sub-component by selecting half of the component states and the transitions between them. States q<sup>0</sup> and q<sup>f</sup> were added to the sub-component as the initial and final state, respectively. State q<sup>0</sup> (q<sup>f</sup> ) was connected with all the states of the sub-component that had, in the original component, at least one incoming (resp., outgoing) transition from (resp., to) a state that was not added to the sub-component. We defined the preand post-conditions for the sub-component and ran the substitutability checker comparing its performance with model-checking.

**Experimental Setup.** We implemented a *random model generator* to create LTSs with a specified number of states, transition density (transitions per state) and number of events. We generated environments with an increasing number of states: 10, 100 and 1000. We have chosen 10 as a fixed value for the transition density and 50 as the cardinality of the set of events. We considered components with 10, 50, 100, 250, 500, 750 and 1000 states. The components were generated using the same transition density and number of events as in the produced environment. To produce the partial component, we considered one of the states of the component obtained previously as a black-box, and randomly selected 25% of the events of the component as the interface of the partial component. To produce the sub-component, we randomly extracted half of the component states and the transitions between them.

*Properties of Interest, Pre- and Post-conditions.* We considered properties , which correspond to commonly used property patterns [16], and where Q and P are appropriately defined fluents. We considered K1, K2 and K3 as pre- and postconditions for the black-box.

**Methodology and Results.** We ran each experiment 5 times on a 2 GHz Intel Core i7, with 8 GB 1600 MHz DDR3 disk. For each combination of values of the #EnvStates and #ContStates we computed the average between the time required by the well-formedness checker (Tw) and by the model checker (Tm), for the experiment E1, and the average between the time required by the substitutability checker (Ts) and by the model checker (Tm), for the experiment E2 (see Table 1). The results show that the well-formedness and the substitutability checker scale as the classical model checker.

**Threats to Validity.** The procedure employed to randomly generate models is a threat to construct validity. However, the transition density of the components was chosen based on the Mars Rover example. Furthermore, the number of states of the sub-component was chosen such that the ratio between the sizes of the component and the sub-component was approximately the same of the Mars Rover. The properties considered in the experiment are a threat to internal validity. However, they were chosen by consulting property specification patterns proposed in literature [16]. Considering a single black-box state is a threat to external validity. However, our goal was to evaluate how FIDDle scales with respect to the component and the environment sizes and not w.r.t. the number of black-box states and the size of the post-conditions. Considering multiple black-box states reduces to the case of considering a single black-box with a more complex post-condition.

## **7 Related Work**

We discuss approaches for developing incrementally correct components.

**Modeling Partiality.** Modal Transition Systems [21], Partial Kripke Structures [8], and LTS<sup>↑</sup> [17] support the specification of incomplete concurrent systems and can be used in an iterative development process. Other formalisms, such as Hierarchical State Machines (HSMs) [4], are used to model sequential processes via a top-down development process but can only be analyzed when a fullyspecified model is available.

**Checking Partial Models.** Approaches to analyze partial models (e.g., [8,10]) are not applicable to the problem considered in this paper where missing subcomponents are specified using contracts and their development is distributed across different development teams. The assumption generation problem for LTSs [17] is complementary to the one considered in this paper and concerns the computation of an assumption that describes how the system model interacts with the environment.

**Substitutability Checking**. The goal of substitutability checking is to verify whether a possibly partial sub-component can be plugged into a higher level structure without affecting its correctness. Problems such as "compositional reasoning" [1,19,30], "component substitutability" [9], and "hierarchical model checking" [4] are related to this part of our work. Our work differs because we first guarantee that the properties of interest are satisfied in the initially-defined partial component and then check that the provided sub-components can be plugged into the initial component.

**Synthesis**. Program synthesis [14,31] aims at computing a model of the system that satisfies the properties of interest. Moreover, synthesis can be used to generate assumptions on a system's environment to make its specification relizable (e.g., [23]). Sketch [36] supports programmers in describing an initial structure of the program that can be completed using synthesis techniques, but does not explicitly consider models. Many techniques for synthesizing components have been proposed, e.g., [14,37], and a fully automated synthesis of highly nontrivial components of over 2000 states big is becoming possible [11] for special cases, by limiting the types of synthesizable goals and using heuristics. However, such cases might not be applicable in general. Recent work has been done in the direction of compositional [2,3] and distributed [34] synthesis. We do not consider our approach to be an alternative to synthesis, but instead a way to combine synthesis techniques with the human design.

## **8 Conclusion**

We presented a verification-driven methodology, called FIDDle, to support iterative distributed development of components. It enables recursively decomposing a component into a set of sub-components so that the correctness of the overall component is ensured. Development of sub-components that satisfy their specifications can then be done independently, via distributed development. We have evaluated FIDDle on a realistic Mars Rover case study. Scalability was evaluated using randomly generated examples.

**Acknowledgments.** Research partly supported from the EU H2020 Research and Innovation Programme under GA No. 731869 (Co4Robots).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Summarizing Software API Usage Examples Using Clustering Techniques**

Nikolaos Katirtzis1,2(B) , Themistoklis Diamantopoulos<sup>3</sup> , and Charles Sutton<sup>2</sup>

<sup>1</sup> Hotels.com, London, UK nkatirtzis@ed-alumni.net <sup>2</sup> School of Informatics, University of Edinburgh, Edinburgh, UK csutton@ed.ac.uk <sup>3</sup> Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Thessaloniki, Greece thdiaman@issel.ee.auth.gr

**Abstract.** As developers often use third-party libraries to facilitate software development, the lack of proper API documentation for these libraries undermines their reuse potential. And although several approaches extract usage examples for libraries, they are usually tied to specific language implementations, while their produced examples are often redundant and are not presented as concise and readable snippets. In this work, we propose a novel approach that extracts API call sequences from client source code and clusters them to produce a diverse set of source code snippets that effectively covers the target API. We further construct a summarization algorithm to present concise and readable snippets to the users. Upon evaluating our system on software libraries, we indicate that it achieves high coverage in API methods, while the produced snippets are of high quality and closely match handwritten examples.

**Keywords:** API usage mining · Documentation · Source code reuse Code summarization · Mining software repositories

## **1 Introduction**

Third-party libraries and frameworks are an integral part of current software systems. Access to the functionality of a library is typically offered by its API, which may consist of numerous classes and methods. However, as noted by multiple studies [24,30], APIs often lack proper examples and documentation and, in general, sufficient explanation on how to be used. Thus, developers often use general-purpose or specialized code search engines (CSEs), and Question-Answering (QA) communities, such as Stack Overflow, in order to find possible API usages. However, the search process in these services can be time consuming [13], while the source code snippets provided in web sites and QA communities might be difficult to recognise, ambiguous, or incomplete [28,29].

As a result, several researchers have studied the problem of API usage mining, which can be described as automatically identifying a set of patterns that characterize how an API is typically used from a corpus of client code [11]. There are two main types of API mining methods. First are methods that return API call sequences, using techniques such as frequent sequence mining [31–33], clustering [25,31,33], and probabilistic modeling [9]. Though interesting, API call sequences do not always describe important information like method arguments and control flow, and their output cannot be directly included in one's code.

A second class of approaches automatically produces source code snippets which, compared to API call sequences, provide more information to the developer, and are more similar to human-written examples. Methods for mining snippets, however, tend to rely on detailed semantic analysis, including program slicing [5,13–15] and symbolic execution [5], which can make them more difficult to deploy to new languages. Furthermore, certain approaches do not use any clustering techniques, thus resulting to a redundant and non-diverse set of API soure code snippets [20], which is not representative as it only uses a few API methods as noted by Fowkes and Sutton [9]. On the other hand, approaches that do use clustering techniques are usually limited to their choice of clustering algorithms [34] and/or use feature sets that are language-specific [13–15].

In this paper, we propose *CLAMS (Clustering for API Mining of Snippets)*, an approach for mining API usage examples that lies between snippet and sequence mining methods, which ensures lower complexity and thus could apply more readily to other languages. The basic idea is to cluster a large set of usage examples based on their API calls, generate summarized versions for the top snippets of each cluster, and then select the most representative snippet from each cluster, using a tree edit distance metric on the ASTs. This results in a diverse set of examples in the form of concise and readable source code snippets. Our method is entirely data-driven, requiring only syntactic information from the source code, and so could be easily applied to other programming languages. We evaluate CLAMS on a set of popular libraries, where we illustrate how its results are more diverse in terms of API methods than those of other approaches, and assess to what extent the snippets match human-written examples.

## **2 Related Work**

Several studies have pointed out the importance of API documentation in the form of examples when investigating API usability [18,22] and API adoption in cases of highly evolving APIs [16]. Different approaches have thus been presented to find or create such examples; from systems that search for examples on web pages [28], to ones that mine such examples from client code located in source code repositories [5], or even from video tutorials [23]. Mining examples from client source code has been a typical approach for Source Code-Based Recommendation Systems (*SCoReS*) [19]. Such methods are distinguished according to their output which can be either source code snippets or API call sequences.

#### **2.1 Systems that Output API Call Sequences**

One of the first systems to mine API usage patterns is *MAPO* [32] which employs *frequent sequence mining* [10] to identify common usage patterns. Although the latest version of the system outputs the API call sequences along with their associated snippets [33], it is still more of a sequence-based approach, as it presents the code of the client method without performing any summarization, while it also does not consider the structure of the source code snippets.

Wang et al. [31] argue that MAPO outputs a large number of usage patterns, many of which are redundant. The authors therefore define *scalability*, *succinctness* and *high-coverage* as the required characteristics of an API miner and construct UP-Miner, a system that mines probabilistic graphs of API method calls and extracts more useful patterns than MAPO. However, the presentation of such graphs can be overwhelming when compared to ranked lists.

Recently, Fowkes and Sutton [9] proposed a method for mining API usage patterns called PAM, which uses probabilistic machine learning to mine a less redundant and more representative set of patterns than MAPO or UP-Miner. This paper also introduced an automated evaluation framework, using handwritten library usage examples from Github, which we adapt in the present work.

#### **2.2 Systems that Output Source Code Snippets**

A typical snippet mining system is *eXoaDocs* [13–15] that employs slicing techniques to summarize snippets retrieved from online sources into useful documentation examples, which are further organized using clustering techniques. However, clustering is performed using semantic feature vectors approximated by the Deckard tool [12], and such features are not straightforward to get extracted for different programming languages. Furthermore, eXoaDocs only targets usage examples of single API methods, as its feature vectors do not include information for mining frequent patterns with multiple API method calls.

*APIMiner* [20] introduces a summarization algorithm that uses slicing to preserve only the API-relevant statements of the source code. Further work by the same authors [4] incorporates association rule techniques, and employs an improved version of the summarization algorithm, with the aim of resolving variable types and adding descriptive comments. Yet the system does not cluster similar examples, while most examples show the usage of a single API method.

Even when slicing is employed in the aforementioned systems, the examples often contain extraneous statements (i.e. statements that could be removed as they are not related to the API), as noted by Buse and Weimer [5]. Hence, the authors introduce a system that synthesizes representative and well-typed usage examples using path-sensitive data flow analysis, clustering, and pattern abstraction. The snippets are complete and abstract, including abstract naming and helpful code, such as try/catch statements. However, the sophistication of their program analysis makes the system more complex [31], and increases the required effort for applying it to new programming languages.

Allamanis and Sutton [1] present a system for mining syntactic idioms, which are syntactic patterns that recur frequently and are closely related to snippets, and thus many of their mined patterns are API snippets. That method is language agnostic, as it relies only on ASTs, but uses a sophisticated statistical method based on Bayesian probabilistic grammars, which limits its scalability.

Although the aforementioned approaches can be effective in certain scenarios, they also have several drawbacks. First, most systems output API call sequences or other representations (e.g. call graphs), which may not be as helpful as snippets, both in terms of understanding and from a reuse perspective (e.g. adapting an example to fit one's own code). Several of the systems that output snippets do not group them into clusters and thus they do not provide a diverse set of usage examples, and even when clustering is employed, the set of features may not allow extending the approaches in other programming languages. Finally, certain systems do not provide concise and readable snippets as their source code summarization capabilities are limited.

In this work, we present a novel API usage mining system, CLAMS, to overcome the above limitations. CLAMS employs clustering to group similar snippets and the output examples are subsequently improved using a summarization algorithm. The algorithm performs heuristic transformations, such as variable type resolution and replacement of literals, while it also removes non-API statements, in order to output concise and readable snippets. Finally, the snippets are ranked in descending order of support and given along with comprehensive comments.

## **3 Methodology**

## **3.1 System Overview**

The architecture of the system is shown in Fig. 1. The input for each library is a set of *Client Files* and the API of the library. The *API Call Extractor* generates a list of API call sequences from each method. The *Clustering Preprocessor* computes a distance matrix of the sequences, which is used by the *Clustering Engine* to cluster them. After that, the top (most representative) sequences from

**Fig. 1.** Overview of the proposed system.

each cluster are selected (*Clustering Postprocessor* ). The source code and the ASTs (from the *AST Extractor* ) of these top snippets are given to the *Snippet Generator* that generates a summarized snippet for each of them. Finally, the *Snippet Selector* selects a single snippet from each cluster, and the output is given by the *Ranker* that ranks the examples in descending order of support.

#### **3.2 Preprocessing Module**

The Preprocessing Module receives as input the client source code files and extracts their ASTs and their API call sequences. The *AST Extractor* employs srcML [8] to convert source code to an XML AST format, while the *API Call Extractor* extracts the API call sequences using the extractor provided by Fowkes and Sutton [9] which uses the Eclipse JDT parser to extract method calls using depth-first AST traversal.

#### **3.3 Clustering Module**

We perform clustering at sequence-level, instead of source code-level, this way considering all useful API information contained in the snippets. As an example, the snippets in Figs. 2a and b, would be clustered together by our Clustering Engine as they contain the same API call sequence. Given the large number and the diversity of the files, our approach is more effective than a clustering that would consider the structure of the client code, while such a decision makes the deployment to new languages easier. Note however that we take into consideration the structure of clustered snippets at a later stage (see Sect. 3.5).

**Fig. 2.** The sample client code on the left side contains the same API calls with the client code on the right side, which are encircled in both snippets.

Our clustering methodology involves first generating a distance matrix and then clustering the sequences using this matrix. The *Clustering Preprocessor* uses the *Longest Common Subsequence (LCS)* between any two sequences in order to compute their distance and then create the distance matrix. Given two sequences *S*<sup>1</sup> and *S*2, their LCS distance is defined as:

$$LCS\,\widetilde{\omega}\,dist\,(S\_1, S\_2) = 1 - 2 \cdot \frac{|LCS\,(S\_1, S\_2)|}{|S\_1| + |S\_2|}\tag{1}$$

where |*S*1| and |*S*2| are the lengths of *S*<sup>1</sup> and *S*2, and |*LCS* (*S*1*, S*2)| is the length of their LCS. Given the distance matrix, the *Clustering Engine* explores the *k*medoids algorithm which is based on the implementation provided by Bauckhage [3], and the hierarchical version of DBSCAN, known as *HDBSCAN* [7], which makes use of the implementation provided by McInnes et al. [17].

The next step is to retrieve the source code associated with the most representative sequence of each cluster (*Clustering Postprocessor* ). Given, however, that each cluster may contain several snippets that are identical with respect to their sequences, we select multiple snippets for each cluster, this way retaining source code structure information, which shall be useful for selecting a single snippet (see Sect. 3.5). Our analysis showed that selecting all possible snippets did not further improve the results, thus we select *n* snippets and set *n* to 5 for our experiments, as trying higher values would not affect the results.

#### **3.4 Snippet Generator**

The *Snippet Generator* generates a summarized version for the top snippets. Our summarization method, a static, flow-insensitive, intra-procedural slicing approach, is presented in Fig. 3. The input (Fig. 3, top left) is the snippet source code, the list of its invoked API calls and a set of variables defined in its outer scope (encircled and highlighted in bold respectively).

At first, any comments are removed and literals are replaced by their srcML type, i.e. string, char, number or boolean (*Step 1* ). In *Step 2*, the algorithm creates two lists, one for API and one for non-API statements (highlighted in bold), based on whether an API method is invoked or not in each statement. Any *control flow statements* that include API statements in their code block are also retained (e.g. the else statement in Fig. 3). In *Step 3*, the algorithm creates a list with all the variables that reside in the local scope of the snippet (highlighted in bold). This is followed by the removal of all non-API statements (*Step 4* ), by traversing the AST in reverse (bottom-up) order.

In *Step 5*, the list of declared variables is filtered, and only those used in the summarized tree are retained (highlighted in bold). Moreover, the algorithm creates a list with all the variables that are declared in API statements and used only in non-API statements (encircled). In *Step 6*, the algorithm adds declarations (encircled) for the variables retrieved in Step 5. Furthermore, descriptive comments of the form "Do something with variable" (highlighted in bold) are added for the variables that are declared in API statements and used in non-API statements (retrieved also in Step 5). Finally, the algorithm adds "Do something" comments in any empty blocks (highlighted in italics).

Finally, note that our approach is quite simpler than static, syntax preserving slicing. E.g., static slicing would not remove any of the statements inside the else block, as the call to the getFromUser API method is assigned to a variable (userName), which is then used in the assignment of user. Our approach, on the other hand, performs a single pass over the AST, thus ensuring lower complexity, which in its turn reduces the overall complexity of our system.

**Fig. 3.** Example summarization of source code snippet.

#### **3.5 Snippet Selector**

The next step is to select a single snippet for each cluster. Given that the selected snippet has to be the most representative of the cluster, we select the one that is most similar to the other top snippets. The score between any two snippets is defined as the tree edit distance between their ASTs, computed using the AP-TED algorithm [21]. Given this metric, we create a matrix for each cluster, which contains the distance between any two top snippets of the cluster. Finally, we select the snippet with the minimum sum of distances in each cluster's matrix.

#### **3.6 Ranker**

We rank the snippets according to the support of their API call sequences, as in [9]. In specific, if the API call sequence of a snippet is a subsequence of the sequence of a file in the repository, then we claim that the file supports the snippet. For example, the snippet with API call sequence [twitter4j.Status.getUser, twitter4j.Status.getText], is supported by a file with sequence [twitter4j.Paging.*<*init*>*, twitter4j.Status.getUser, twitter4j.Status.getId, twitter4j.Status.getText, twitter4j. Status.getUser]. In this way, we compute the support for each snippet and create a complete ordering. Upon ordering the snippets, the AStyle formatter [2] is also used to fix the indentation and spacing.

## **3.7 Deploying to New Languages**

Our methodology can be easily applied on different programming languages. The Preprocessing Module and the Snippet Selector make use of the source code's AST, which is straightforward to extract in different languages. The Clustering Module and the Ranker use API call sequences and not any semantic features that are language-specific, while our summarization algorithm relies on statements and their control flow, a fundamental concept of imperative languages. Thus, extending our methodology to additional programming languages requires only the extraction of the AST of the source code, which can be done using appropriate tools (e.g. srcML), and possibly a minor adjustment on our summarization algorithm to conform to the AST schema extracted from different tools.

## **4 Evaluation**

## **4.1 Evaluation Framework**

We evaluate CLAMS on the APIs (all public methods) of 6 popular Java libraries, which were selected as they are popular (based on their GitHub stars and forks), cover various domains, and have handwritten examples to compare our snippets with. The libraries are shown in Table 1, along with certain statistics concerning the lines of code of their examples' directories (Example LOC) and the lines of code considered from GitHub as using their API methods (Client LOC).


**Table 1.** Summary of the evaluation dataset.

To further strengthen our hypothesis, we also employ an automated method for evaluating our system, to allow quantitative comparison of its different variants. To assess whether the snippets of CLAMS are representative, we look for "gold standard" examples online, as writing our own examples would be timeconsuming and lead to subjective results.

We focus our evaluation on the 4 research questions of Fig. 4. RQ1 and RQ2 refer to summarization and clustering respectively and will be evaluated with respect to handwritten examples. For RQ3 we assess the API coverage achieved by CLAMS versus the ones achieved by the API mining systems MAPO [32,33] and UP-Miner [31]. RQ4 will determine whether the extra information of source code snippets when compared to API call sequences is useful to developers.

**RQ1:** How much more concise, readable, and precise with respect to handwritten examples are the snippets after summarization?

**RQ2:** Do more powerful clustering techniques, that cluster similar rather than identical sequences, lead to snippets that more closely match handwritten examples? **RQ3:** Does our tool mine more diverse patterns than other existing approaches? **RQ4:** Do snippets match handwritten examples more than API call sequences?

**Fig. 4.** Research Questions (RQs) to be evaluated.

We consider four configurations for our system: *NaiveNoSum*, *NaiveSum*, *KMedoidsSum*, and *HDBSCANSum*. To reveal the effect of clustering sequences, the first two configurations do not use any clustering and only group identical sequences together, while the last two use the *k*-medoids and the *HDBSCAN* algorithms, respectively. Also the first configuration (*NaiveNoSum*) does not employ our summarizer, while all others do, so that we can measure its effect.

We define metrics to assess the *readability*, *conciseness*, and *quality* of the returned snippets. For readability, we use the metric defined by Buse and Weimer [6] which is based on human studies and agrees with a large set of human annotators. Given a Java source code file, the tool provided by Buse and Weimer [27] outputs a value in the range [0*.*0, 1*.*0], where a higher value indicates a more readable snippet. For conciseness, we use the number of *Physical Lines of Code* (*PLOCs*). Both metrics have already been used for the evaluation of similar systems [5]. For quality, as a proxy measure we use the similarity of the set of returned snippets to a set of handwritten examples from the module's developers.

We define the similarity of a snippet *s* given a set of examples *E* as *snippet precision*. First, we define a set *E<sup>s</sup>* with all the examples in *E* that have exactly the same API calls with snippet *s*. After that, we compute the similarity of *s* with all matching examples *e* ∈ *E<sup>s</sup>* by splitting the code into sets of tokens and applying set similarity metrics<sup>1</sup>. Tokenization is performed using a Java code tokenizer and the tokens are cleaned by removing symbols (e.g. brackets, etc.) and comments, and by replacing literals (i.e. numbers, etc.) with their respective types. The precision of *s* is the maximum of its similarities with all *e* ∈ *Es*:

<sup>1</sup> Our decision to apply set similarity metrics instead of an edit distance metric is based on the fact that the latter one is heavily affected and can be easily skewed by the order of the statements in the source code (e.g. nested levels, etc.), while it would not provide a fair comparison between snippets and sequences.

$$Prec(s) = \max\_{e \in E\_s} \left\{ \frac{|T\_s \cap T\_e|}{|T\_s|} \right\} \tag{2}$$

where *T<sup>s</sup>* and *T<sup>e</sup>* are the set of tokens of the snippet *s* and of the example *e*, respectively. Finally, if no example has exactly the same API calls as the snippet (i.e. *E<sup>s</sup>* = ∅), then snippet precision is set to zero. Given the snippet precision, we also define the average snippet precision for *n* snippets *s*1*, s*2*,...,s<sup>n</sup>* as:

$$AvgPrec(n) = \frac{1}{n} \sum\_{i=1}^{n} Prec(s\_i) \tag{3}$$

Similarly, average snippet precision at top *k* can be defined as:

$$AvgPrec@k = \frac{1}{k} \sum\_{j=1}^{k} Prec@j \text{ where } Prec@j = \frac{1}{j} \sum\_{i=1}^{j} Prec(s\_i) \tag{4}$$

This metric is useful for evaluating our system which outputs ordered results, as it allows us to illustrate and draw conclusions for precision at different levels.

We also define coverage at *k* as the number of unique API methods contained in the top *k* snippets. This metric has already been defined in a similar manner by Fowkes and Sutton [9], who claim that a list of patterns with identical methods would be redundant, non-diverse, and thus not representative of the target API.

Finally, we measure additional information provided in source code snippets when compared with API call sequences. For each snippet we extract its *snippettokens Ts*, as defined in (2), and its *sequence-tokens T<sup>s</sup>* , which are extracted by the underlying API call sequence of the snippet, where each token is the name of an API method. Based on these sets, we define the *additional info* metric as:

$$AdditInfo = \frac{1}{m} \sum\_{i=1}^{m} \frac{\max\_{e \in E\_s} \{|T\_{s\_i} \cap T\_e|\}}{\max\_{e \in E\_s} \{|T\_{s\_i} ^\prime \cap T\_e|\}} \tag{5}$$

where *m* is the number of snippets that match to at least one example.

#### **4.2 Evaluation Results**

**RQ1: How much more concise, readable, and precise with respect to handwritten examples are the snippets after summarization?** We evaluate how much reduction in the size of the snippets is achieved by the summarization algorithm, and the effect of summarization on the precision with respect to handwritten examples. If snippets have high or higher precision after summarization, then this indicates that the tokens removed by summarization are ones that do not typically appear in handwritten examples, and thus are possibly less relevant. For this purpose, we use the first two versions of our system, namely the *NaiveSum* and the *NaiveNoSum* versions. Both of them use the naive clustering technique, where only identical sequences are clustered together. Figures 5a and b depict the average readability of the snippets mined for each library and the

**Fig. 5.** Figures of (a) the average readability, and (b) the average PLOCs of the snippets, for each library, with (*NaiveSum*) and without (*NaiveNoSum*) summarization.

average PLOCs, respectively. The readability of the mined snippets is almost doubled when performing summarization, while the snippets generated by the *NaiveSum* version are clearly smaller than those mined by *NaiveNoSum*. In fact, the majority of the snippets of *NaiveSum* contain less than 10 PLOCs, owing mainly to the non-API statements removal of the algorithm. On average, the summarization algorithm leads to 40% fewer PLOCS. Thus, we may argue that the snippets provided by our summarizer are readable and concise.

Apart from readability and conciseness, which are both regarded as highly desirable features [26], we further assess whether the summarizer produces snippets that closely match handwritten examples. Therefore, we plot the snippet precision at top *k*, in Fig. 6a. The plot indicates a downward trend in precision for both configurations, which is explained by the fact that the snippets of lower positions are more complex, as they normally contain a large number of API calls. In any case, it is clear that the version that uses the summarizer mines more precise snippets than the one not using it, for any value of *k*. E.g., for *k* = 10, the summarizer increases snippet precision from 0.27 to 0.35, indicating that no useful statements are removed and no irrelevant statements are added.

**RQ2: Do more powerful clustering techniques, that cluster similar rather than identical sequences, lead to snippets that more closely match handwritten examples?** In this experiment we compare *NaiveSum*, *KMedoidsSum*, and *HDBSCANSum* to assess the effect of applying different clustering techniques on the snippets. In order for the comparison to be fair, we use the same number of clusters for both k-medoids and HDBSCAN. Therefore, we first run HDBSCAN (setting its *min cluster size* parameter to 2), and then use the number of clusters generated by the algorithm for *k*-medoids. After that, we consider the top *k* results of the three versions, so that the comparison with the Naive method (that cannot be tuned) is also fair. Hence, we plot precision against coverage, in a similar manner to precision versus recall graphs. For this

**Fig. 6.** Figures of (a) precision at top *k*, with (*NaiveSum*) or without (*NaiveNoSum*) summarization, and (b) the average interpolated snippet precision versus API coverage for three system versions (clustering algorithms), using the top 100 mined snippets.

we use the snippet precision at *k* and coverage at *k*, while we make use of an *interpolated* version of the curve, where the precision value at each point is the maximum for the corresponding coverage value. Figure 6b depicts the curve for the top 100 snippets, where the areas under the curves are shaded. Area *A2* reveals the additional coverage in API methods achieved by *HDBSCANSum*, when compared to *NaiveSum* (*A1* ), while *A3* shows the corresponding additional coverage of *KMedoidsSum*, when compared to *HDBSCANSum* (*A2* ).

*NaiveSum* achieves slightly better precision than the versions using clustering, which is expected as most of its top snippets use the same API calls, and contain only a few API methods. As a consequence, however, its coverage is quite low, due to the fact that only identical sequences are grouped together. Given that coverage is considered quite important when mining API usage examples [31], and that precision among all three configurations is similar, we may argue that *KMedoidsSum* and *HDBSCANSum* produce sufficiently precise and also more varying results for the developer. The differences between these two methods are mostly related to the separation among the clusters; the clusters created by *KMedoidsSum* are more separated and thus it achieves higher coverage, whereas *HDBSCANSum* has slightly higher precision. To achieve a trade-off between precision and coverage, we select *HDBSCANSum* for the last two RQs.

**RQ3: Does our tool mine more diverse patterns than other existing approaches?** For this research question, we evaluate the diversity of the examples of CLAMS to that of two API mining approaches, MAPO [32,33] and UP-Miner [31], which were deemed most similar to our approach from a mining perspective (as it also works at sequence level)<sup>2</sup>. We measure diversity using the coverage at *k*. Figure 7a depicts the coverage in API methods for each approach and each library, while Fig. 7b shows the average number of API methods covered at top *k*, using the top 100 examples of each approach.

<sup>2</sup> Comparing with other tools was also hard, as most are unavailable, such as, e.g., the eXoaDocs web app (http://exoa.postech.ac.kr/) or the APIMiner website (http:// java.labsoft.dcc.ufmg.br/apimineride/resources/docs/reference/).

**Fig. 7.** Graphs of the coverage in API methods achieved by CLAMS, MAPO, and UP-Miner, (a) for each project, and (b) on average, at top *k*, using the top 100 examples.

The coverage by MAPO and UP-Miner is quite low, which is expected since both tools perform frequent sequence mining, thus generating several redundant patterns, a limitation noted also by Fowkes and Sutton [9]. On the other hand, our system integrates clustering techniques to reduce redundancy which is further eliminated by the fact that we select a single snippet from each cluster (Snippet Selector). Finally, the average coverage trend (Fig. 7b) indicates that our tool mines more diverse sequences than the other two tools, regardless of the number of examples.

**RQ4: Do source code snippets match handwritten examples more than API call sequences?** Obviously source code snippets contain more tokens than API call sequences, but the additional tokens might not be useful. Therefore, we measure specifically whether the additional tokens that appear in snippets rather than sequences also appear in handwritten examples. Computing the average of the *additional info* metric for each library, we find that the average ratio between snippets-tokens and sequence-tokens, that are shared between snippets and corresponding examples, is 2*.*75. This means that presenting snippets instead of sequences leads to 2*.*75 times more information. By further plotting the additional information of the snippets for each library in Fig. 8, we observe that snippets almost always provide at least twice as much valuable information. To further illustrate the contrast between snippets and sequences, we present an indicative snippet mined by CLAMS in Fig. 9. Note, e.g., how the try/catch tokens are important, however not included in the sequence tokens.

Finally, we present the top 5 usage examples mined by CLAMS, MAPO and UP-Miner, in Fig. 10. As one may observe, snippets provide useful information that is missing from sequences, including identifiers (e.g. String secret), control flow statements (e.g. if-then-else statements), etc. Moreover, snippets are easier to integrate into the source code of the developer, and thus facilitate reuse.

**Fig. 8.** Additional information revealed when mining snippets instead of sequences.

**Fig. 9.** Example snippet matched to handwritten example. Sequence-tokens are encircled and additional snippet-tokens are highlighted in bold.

Interestingly, the snippet ranked second by CLAMS has not been matched to any handwritten example, although it has high support in the dataset. In fact, there is no example for the setOauthConsumer method of *Twitter4J*, which is one of its most popular methods. This illustrates how CLAMS can also extract snippets beyond those of the examples directory, which are valuable to developers.

## **5 Threats to Validity**

The main threats to validity of our approach involve the choice of the evaluation metrics and the lack of comparison with snippet-based approaches. Concerning the metrics, snippet API coverage is typical when comparing API usage mining approaches. On the other hand, the choice of metrics for measuring snippet quality is indeed a subjective criterion. To address this threat, we have employed three metrics, for the conciseness (PLOCs), readability, and quality (similarity to real examples). Our evaluation indicates that CLAMS is effective on all of these axes. In addition, as these metrics are applied on snippets, computing them for sequence-based systems such as MAPO and UP-Miner was not possible. Finally, to evaluate whether CLAMS can be practically useful when developing software, we plan to conduct a developer survey. To this end, we have already performed a preliminary study on a team of 5 Java developers of Hotels.com, the results of which were encouraging. More details about the study can be found at https://mast-group.github.io/clams/user-survey/ (omitted here due to space limitations).

Concerning the comparison with current approaches, we chose to compare CLAMS against sequence-based approaches (MAPO and UP-Miner), as the mining methodology is actually performed at sequence level. Nevertheless, comparing with snippet-based approaches would also be useful, not only as a proof of concept but also because it would allow us to comparatively evaluate CLAMS with regard to the snippet quality metrics mentioned in the previous paragraph. However, such a comparison was troublesome, as most current tools (including e.g., eXoaDocs, APIMiner, etc.) are currently unavailable (see RQ3 of Sect. 4.2). We may however note this comparison as an important point for future work, while we also choose to upload our code and findings online (https://mastgroup.github.io/clams/) to facilitate future researchers that may face similar challenges.

#### **6 Conclusion**

In this paper we have proposed a novel approach for mining API usage examples in the form of source code snippets, from client code. Our system uses clustering techniques, as well as a summarization algorithm to mine useful, concise, and readable snippets. Our evaluation shows that snippet clustering leads to better precision versus coverage rate, while the summarization algorithm effectively increases the readability and decreases the size of the snippets. Finally, our tool offers diverse snippets that match handwritten examples better than sequences.

In future work, we plan to extend the approach used to retrieve the top mined sequences from each cluster. We could use a two-stage clustering approach where, after clustering the API call sequences, we could further cluster the snippets of the formed clusters, using a tree edit distance metric. This would allow retrieving snippets that use the same API call sequence, but differ in their structure.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Fast Computation of Arbitrary Control Dependencies**

Jean-Christophe L´echenet1,2(B) , Nikolai Kosmatov<sup>1</sup> , and Pascale Le Gall<sup>2</sup>

<sup>1</sup> CEA, LIST, Software Reliability Laboratory, PC 174, 91191 Gif-sur-Yvette, France

{jean-christophe.lechenet,nikolai.kosmatov}@cea.fr <sup>2</sup> Laboratoire de Math´ematiques et Informatique pour la Complexit´e et les Syst`emes CentraleSup´elec, Universit´e Paris-Saclay, 91190 Gif-sur-Yvette, France pascale.legall@centralesupelec.fr

**Abstract.** In 2011, Danicic et al. introduced an elegant generalization of the notion of control dependence for any directed graph. They also proposed an algorithm computing the weak control-closure of a subset of graph vertices and performed a paper-and-pencil proof of its correctness. We have performed its proof in the Coq proof assistant. This paper also presents a novel, more efficient algorithm to compute weak controlclosure taking benefit of intermediate propagation results of previous iterations in order to accelerate the following ones. This optimization makes the design and proof of the algorithm more complex and requires subtle loop invariants. The new algorithm has been formalized and mechanically proven in the Why3 verification tool. Experiments on arbitrary generated graphs with up to thousands of vertices demonstrate that the proposed algorithm remains practical for real-life programs and significantly outperforms Danicic's initial technique.

## **1 Introduction**

**Context.** *Control dependence* is a fundamental notion in software engineering and analysis (e.g. [6,12,13,21,22,27]). It reflects structural relationships between different program statements and is intensively used in many software analysis techniques and tools, such as compilers, verification tools, test generators, program transformation tools, simulators, debuggers, etc. Along with data dependence, it is one of the key notions used in *program slicing* [25,27], a program transformation technique allowing to decompose a given program into a simpler one, called a program slice.

In 2011, Danicic et al. [11] proposed an elegant generalization of the notions of closure under non-termination insensitive (*weak*) and non-termination sensitive (*strong*) control dependence. They introduced the notions of weak and strong control-closures, that can be defined on any directed graph, and no longer only on control flow graphs. They proved that weak and strong control-closures subsume the closures under all forms of control dependence previously known in the literature. In the present paper, we are interested in the non-termination insensitive form, i.e. *weak control-closure*.

Besides the definition of weak control-closure, Danicic et al. also provided an algorithm computing it for a given set of vertices in a directed graph. This algorithm was proved by paper-and-pencil. Under the assumption that the given graph is a CFG (or more generally, that the maximal out-degree of the graph vertices is bounded), the complexity of the algorithm can be expressed in terms of the number of vertices n of the graph, and was shown to be O(n3). Danicic et al. themselves suggested that it should be possible to improve its complexity. This may explain why this algorithm was not used until now.

**Motivation.** Danicic et al. introduced basic notions used to define weak controlclosure and to justify the algorithm, and proved a few lemmas about them. While formalizing these concepts in the Coq proof assistant [5,24], we have discovered that, strictly speaking, the paper-and-pencil proof of one of them [11, Lemma 53] is inaccurate (a previously proven case is applied while its hypotheses are not satisfied), whereas the lemma itself is correct. Furthermore, Danicic's algorithm does not take advantage of its iterative nature and does not reuse the results of previous iterations in order to speed up the following ones.

**Goals.** First, we fully formalize Danicic's algorithm, its correctness proof and the underlying concepts in Coq. Our second objective is to design a more efficient algorithm sharing information between iterations to speed up the execution. Since our new algorithm is carefully optimized and more complex, its correctness proof relies on more subtle arguments than for Danicic's algorithm. To deal with them and to avoid any risk of error, we have decided again to use a mechanized verification tool – this time, the Why3 proof system [1,14] – to guarantee correctness of the optimized version. Finally, in order to evaluate the new algorithm with respect to Danicic's initial technique, we have implemented both algorithms in OCaml (using OCamlgraph library [9]) and tested them on a large set of randomly generated graphs with up to thousands of vertices. Experiments demonstrate that the proposed optimized algorithm is applicable to large graphs (and thus to CFGs of real-life programs) and significantly outperforms Danicic's original technique.

**Contributions.** The contributions of this paper include:


The Coq, Why3 and OCaml implementations are all available in [17].

**Outline.** We present our motivation and a running example in Sect. 2. Then, we recall the definitions of some important concepts introduced by [11] in Sect. 3 and state two important lemmas in Sect. 4. Next, we describe Danicic's algorithm in Sect. 5 and our algorithm along with a sketch of the proof of its correctness in Sect. 6. Experiments are presented in Sect. 7. Finally, Sect. 8 presents some related work and concludes.

#### **2 Motivation and Running Example**

This section informally presents weak control-closure using a running example.

The inputs of our problem are a directed graph G = (V,E) with set of vertices (or nodes) V and set of edges E, and a subset of vertices V ⊆ V . The property of interest of such a subset is called *weakly control-closed* in [11] (cf. Definition 3). V is said to be *weakly control-closed* if the nodes reachable from V are V *-weakly committing* (cf. Definition 2), i.e. always lead the flow to at most one node in V . Since V does not necessarily satisfy this property, we want to build a superset of V satisfying it, and more particularly the smallest one, called the *weak control-closure* of V in G (cf. Definition 5). For that, as it will be proved by Lemma 2, we need to add to V the points of divergence closest to V , called

**Fig. 1.** Example graph G0, with V - <sup>0</sup> <sup>=</sup> {u1, u3}

the V *-weakly deciding* vertices, that are reachable from V . Formally, vertex u is V *-weakly deciding* if there exist two non-trivial paths starting from u and reaching V that have no common vertex except u (cf. Definition 4).

Let us illustrate these ideas on an example graph G<sup>0</sup> shown in Fig. 1. V <sup>0</sup> = {u1, u3} is the subset of interest represented with dashed double circles ( <sup>u</sup><sup>i</sup> ) in Fig. 1. u<sup>5</sup> is reachable from V <sup>0</sup> and is not V <sup>0</sup> -weakly committing, since it is the origin of two paths u5, u6, u0, u<sup>1</sup> and u5, u6, u0, u2, u<sup>3</sup> that can lead the flow to two different nodes u<sup>1</sup> and u<sup>3</sup> in V <sup>0</sup> . Therefore, V <sup>0</sup> is not weakly control-closed. To build the weak control-closure, we need to add to V <sup>0</sup> all V <sup>0</sup> -weakly deciding nodes reachable from V <sup>0</sup> . u<sup>0</sup> is such a node. Indeed, it is reachable from V <sup>0</sup> and we can build two non-trivial paths u0, u<sup>1</sup> and u0, u2, u<sup>3</sup> starting from u0, ending in V <sup>0</sup> (respectively in u<sup>1</sup> and u3) and sharing no other vertex than u0. Similarly, nodes u2, u<sup>4</sup> and u<sup>6</sup> must be added as well. On the contrary, u<sup>5</sup> must not be added, since every non-empty path starting from u<sup>5</sup> has u<sup>6</sup> as second vertex. More generally, a node with only one child cannot be a "divergence point closest to V " and must never be added to build the weak control-closure. The weak control-closure of V <sup>0</sup> in G<sup>0</sup> is thus {u0, u1, u2, u3, u4, u6}.

To build the closure, Danicic's algorithm, like the one we propose, does not directly try to build the two paths sharing only one node. Both algorithms rely on a concept called *observable vertex*. Given a vertex u ∈ V , the set of *observable vertices* in V from u contains all nodes reachable from u in V without using edges starting in V . The important property about this object is that, as it will be proved by Lemma 4, if there exists an edge (u, v) ∈ E such that u is not in V , u is reachable from V , v can reach V and there exists a vertex w

**Fig. 2.** Example graph G<sup>0</sup> annotated with observable sets

observable from u but not from v, then u must be added to V to build the weak control-closure. Figure 2a shows our example graph G0, each node being annotated with its set of observables in V 0 .

(u0, u1) is an edge such that u<sup>0</sup> is reachable from V <sup>0</sup> , u<sup>1</sup> can reach V <sup>0</sup> and u<sup>3</sup> is an observable vertex from u<sup>0</sup> in V <sup>0</sup> but not from u1. u<sup>0</sup> is thus a node to be added in the weak control-closure. Likewise, from the edges (u2, u3) and (u4, u3), we can deduce that u<sup>2</sup> and u<sup>4</sup> belong to the closure. However, we have seen that u<sup>6</sup> belongs to the closure, but it is not possible to apply the same reasoning to (u6, u0), (u6, u4) or (u6, u5). We need another technique. As Lemma 3 will establish, the technique is actually iterative. We can add to the initial V <sup>0</sup> the nodes that we have already detected and apply our technique to this new set V 0 . The vertices that will be detected this way will also be in the closure of the initial set V <sup>0</sup> . The observable sets w.r.t. to V <sup>0</sup> = V <sup>0</sup> ∪ {u0, u2, u4} are shown in Fig. 2b. This time, both edges (u6, u4) and (u6, u0) allow us to add u<sup>6</sup> to the closure. Applying again the technique with the augmented set V <sup>0</sup> = V <sup>0</sup> ∪ {u6} (cf. Fig. 2c) does not reveal new vertices. This means that all the nodes have already been found. We obtain the same set as before for the weak control-closure of V 0 , i.e. {u0, u1, u2, u3, u4, u6}.

#### **3 Basic Concepts**

This section introduces basic definitions and properties needed to define the notion of weak control-closure. They have been formalized in Coq [17], including in particular Property 3 whose proof in [11] was inaccurate.

From now on, let G = (V,E) denote a directed graph, and V a subset of V . We define a *path* in G in the usual way. We write u path −−−→ v if there exists a path from u to v. Let RG(V ) = {v ∈ V | ∃u ∈ V , u path −−−→ v} be the set of nodes reachable from V . In our example (cf. Fig. 1), u6, u0, u1, u<sup>3</sup> is a (4-node) path in G0, u<sup>1</sup> is a trivial one-node path in G<sup>0</sup> from u<sup>1</sup> to itself, and R<sup>G</sup><sup>0</sup> (V <sup>0</sup> ) = V0.

**Definition 1 (**V **-disjoint,** V **-path).** *A path* π *in* G *is said to be* V -disjoint *in* G *if all the vertices in* π *but the last one are not in* V *. A* V -path *in* G *is a* V *-disjoint path whose last vertex is in* V *. In particular, if* u ∈ V *, the only* V *-path starting from* u *is the trivial path* u*.*

We write u V - <sup>−</sup>disjoint −−−−−−−−→ <sup>v</sup> (resp. <sup>u</sup> V - <sup>−</sup>path −−−−−−→ <sup>v</sup>) if there exists a <sup>V</sup> -disjoint path (resp. a V -path) from u to v.

*Example.* In G0, u3; u2, u3; u0, u1; u0, u2, u<sup>3</sup> are V <sup>0</sup> -paths and thus V <sup>0</sup> -disjoint paths. u6, u<sup>0</sup> is a V <sup>0</sup> -disjoint path but not a V <sup>0</sup> -path.

*Remark 1.* Definition 1 and the following ones are slightly different from [11], where a V -path must contain at least two vertices and there is no constraint on its first vertex, which can be in V or not. Our definitions lead to the same notion of weak control-closure.

**Definition 2 (**V **-weakly committing vertex).** *A vertex* u *in* G *is* V -weakly committing *if all the* V *-paths from* u *have the same end point (in* V *). In particular, any vertex* u ∈ V *is* V *-weakly committing.*

*Example.* In G0, u<sup>1</sup> and u<sup>3</sup> are the only V <sup>0</sup> -weakly committing nodes.

**Definition 3 (Weakly control-closed set).** *A subset* V *of* V *is* weakly control-closed *in* G *if every vertex reachable from* V *is* V *-weakly committing.*

*Example.* Since in particular u<sup>2</sup> is not V <sup>0</sup> -weakly committing and reachable from V <sup>0</sup> , V <sup>0</sup> is not weakly control-closed in G0. ∅, singletons and the set of all nodes V<sup>0</sup> are trivially weakly control-closed. Less trivial weakly control-closed sets include {u0, u1}, {u4, u5, u6} and {u0, u1, u2, u3, u4, u6}.

Definition 3 characterizes a weakly control-closed set, but does not explain how to build one. It would be particularly interesting to build the smallest weakly control-closed set containing a given set V . The notion of *weakly deciding vertex* will help us to give an explicit expression to that set.

**Definition 4 (**V **-weakly deciding vertex).** *A vertex* u *is* V -weakly deciding *if there exist at least two non-trivial* V *-paths from* u *that share no vertex except* u*. Let* WDG(V ) *denote the set of* V *-weakly deciding vertices in* G*.*

*Property 1.* If u ∈ V , then u /∈ WDG(V ) (by Definitions 1, 4).

*Example.* In G0, by Property 1, u1, u<sup>3</sup> ∈/ WD<sup>G</sup><sup>0</sup> (V <sup>0</sup> ). We have illustrated the definition for nodes u<sup>0</sup> and u<sup>5</sup> in Sect. 2. We have WD<sup>G</sup><sup>0</sup> (V <sup>0</sup> ) = {u0, u2, u4, u6}.

**Lemma 1 (Characterization of being weakly control-closed).** V *is weakly control-closed in* G *if and only if there is no* V *-weakly deciding vertex in* G *reachable from* V *.*

*Example.* In G0, u<sup>2</sup> is reachable from V <sup>0</sup> and is V <sup>0</sup> -weakly deciding. This gives another proof that V <sup>0</sup> is not weakly control-closed.

Here are two other useful properties of WDG.

*Property 2.* ∀ V <sup>1</sup> , V <sup>2</sup> ⊆ V, V <sup>1</sup> ⊆ V <sup>2</sup> =⇒ WDG(V <sup>1</sup> ) ⊆ V <sup>2</sup> ∪ WDG(V 2 )

*Property 3.* WDG(V ∪ WDG(V )) = ∅.

We can prove that adding to a given set V the V -weakly deciding nodes that are reachable from V gives a weakly control-closed set in G. This set is the smallest superset of V weakly control-closed in G.

**Lemma 2 (Existence of the weak control-closure).** *Let* W = WDG(V )∩ RG(V ) *denote the set of vertices in* WDG(V ) *that are reachable from* V *. Then* V ∪ W *is the smallest weakly control-closed set containing* V *.*

**Definition 5 (Weak control-closure).** *We call* weak control-closure *of* V *, denoted* WCCG(V )*, the smallest weakly control-closed set containing* V *.*

*Property 4.* Let V , V <sup>1</sup> and V <sup>2</sup> be subsets of V . Then


## **4 Main Lemmas**

This section gives two lemmas used to justify both Danicic's algorithm and ours.

**Lemma 3.** *Let* V *and* W *be two subsets of* V *. If* V ⊆ W ⊆ V ∪ WDG(V )*, then* W ∪ WDG(W) = V ∪ WDG(V )*. If moreover* V ⊆ W ⊆ WCCG(V )*, then* WCCG(W) = WCCG(V )*.*

*Proof.* Assume V ⊆ W ⊆ V ∪WDG(V ). Since V ⊆ W, we have by Property 2, WDG(V ) ⊆ W ∪ WDG(W). Moreover, W ⊆ V ∪ WDG(V ), thus WDG(W) ⊆ V ∪ WDG(V ) ∪ WDG(V ∪ WDG(V )) by Property 2, hence WDG(W) ⊆ V ∪ WDG(V ) by Property 3. These inclusions imply W ∪WDG(W) = V ∪WDG(V ).

If now V ⊆ W ⊆ WCCG(V ), we deduce WCCG(W) = WCCG(V ) from the previous result by intersecting with RG(V ) by Property 4a. 

Lemma 3 allows to design iterative algorithms to compute the closure. Indeed, assume that we have a procedure which, for any non-weakly control-closed set V , can return one or more elements of the weak control-closure of V not in V . If we apply such a procedure to V once, we get a set W that satisfies V ⊆ W ⊆ WCCG(V ). From Lemma 3, WCCG(W) = WCCG(V ). To compute the weak control-closure of V , it is thus sufficient to build the weak controlclosure of W. We can apply our procedure again, this time to W, and repetitively on all the successively computed sets. Since each set is a strict superset of the previous one, this iterative procedure terminates because graph G is finite.

Before stating the second lemma, we introduce a key concept. It is called Θ in [11]. We use the name "observable" as in [26].

**Definition 6 (Observable).** *Let* u ∈ V *. The set of* observable vertices *from* u *in* V *, denoted* obsG(u, V )*, is the set of vertices* u *in* V *such that* u V - <sup>−</sup>path −−−−−−→ <sup>u</sup> *.*

*Remark 2.* A vertex u ∈ V is its unique observable: obsG(u, V ) = {u}.

The concept of observable set was illustrated in Fig. 2 (cf. Sect. 2).

**Lemma 4 (Sufficient condition for being** V **-weakly deciding).** *Let* (u, v) *be an edge in* G *such that* u ∈ V *,* v *can reach* V *and there exists a vertex* u *in* V *such that* u ∈ obsG(u, V ) *and* u ∈ obsG(v, V )*. Then* u ∈ WDG(V )*.*

*Proof.* We need to exhibit two V -paths from u ending in V that share no vertex except u. We take the V -path from u to u as the first one, and a V -path connecting u to V through v as the second one (we construct it by prepending u to the smallest prefix of the path from v ending in V which is a V -path). If these V -paths intersected at a node y different from u, we would have a V -path from v to u by concatenating the paths from v to y and from y to u , which is contradictory. 

*Example.* In G0, obs<sup>G</sup><sup>0</sup> (u0, V <sup>0</sup> ) = {u1, u3} and obs<sup>G</sup><sup>0</sup> (u1, V <sup>0</sup> ) = {u1} (cf. Fig. 2a). Since u<sup>1</sup> is a child of u0, we can apply Lemma 4, and deduce that u<sup>0</sup> is V <sup>0</sup> -weakly deciding. obs<sup>G</sup><sup>0</sup> (u5, V <sup>0</sup> ) = {u1, u3} and obs<sup>G</sup><sup>0</sup> (u6, V <sup>0</sup> ) = {u1, u3}. We cannot apply Lemma 4 to u5, and for good reason, since u<sup>5</sup> is not V <sup>0</sup> -weakly deciding. But we cannot apply Lemma 4 to u<sup>6</sup> either, since u<sup>6</sup> and all its children u0, u<sup>4</sup> and u<sup>5</sup> have observable sets {u1, u3} w.r.t. V <sup>0</sup> , while u<sup>6</sup> is V <sup>0</sup> -weakly deciding. This shows that with Lemma 4, we have a sufficient condition, but not a necessary one, for proving that a vertex is weakly deciding.

*Example.* Let us apply Algorithm 1 to our running example G<sup>0</sup> (cf. Fig. 1). Initially, W<sup>0</sup> = V <sup>0</sup> = {u1, u3}.


```
Input: G = (V,E) a directed graph
        V -
           ⊆ V
  Output: W ⊆ V the weak control-closure of V -

  Ensures: W = WCCG(V -

                        )
1 begin
2 W ← V -

3 while there exists a W-critical edge in E do
4 choose such a W-critical edge (u, v)
5 W ← W ∪ {u}
6 end
7 return W
8 end
```
**Algorithm 1.** Danicic's original algorithm for weak control-closure [11]

## **5 Danicic's Algorithm**

We present here the algorithm described in [11]. This algorithm and a proof of its correctness have been formalized in Coq [17]. The algorithm is nearly completely justified by a following lemma (Lemma 5, equivalent to [11, Lemma 60]).

We first need to introduce a new concept, which captures edges that are of particular interest when searching for weakly deciding vertices. This concept is taken from [11], where it was not given a name. We call such edges *critical edges*.

**Definition 7 (Critical edge).** *An edge* (u, v) *in* G *is called* V -critical *if:*

*(1)* | obsG(u, V )| ≥ 2*; (2)* | obsG(v, V )| = 1*; (3)* u *is reachable from* V *in* G*.*

*Example.* In G0, (u0, u1), (u2, u3) and (u4, u3) are the V <sup>0</sup> -critical edges.

**Lemma 5.** *If* V *is not weakly control-closed in* G*, then there exists a* V  *critical edge* (u, v) *in* G*. Moreover, if* (u, v) *is such a* V *-critical edge, then* u ∈ WDG(V ) ∩ RG(V )*, therefore* u ∈ WCCG(V )*.*

*Proof.* Let x be a vertex in WDG(V ) reachable from V . There exists a V -path π from x ending in x ∈ V . It follows that | obsG(x, V )| ≥ 2 and | obsG(x , V )| = 1. Let u be the last vertex on π with at least two observable nodes in V and v its successor on π. Then (u, v) is a V -critical edge.

Assume there exists a V -critical edge (u, v). Since | obsG(u, V )| ≥ 2 and | obsG(v, V )| = 1, u ∈ V , v can reach V and there exists u in obsG(u, V ) but not in obsG(v, V ). By Lemma 4, u ∈ WDG(V ) and thus u ∈ WCCG(V ). 

*Remark 3.* We can see in the proof above that we do not need the exact values 2 and 1. We just need strictly more observable vertices for u than for v and at least one observable for v, to satisfy the hypotheses of Lemma 4.

As described in Sect. 4, we can build an iterative algorithm constructing the weak control-closure of V by searching for critical edges on the intermediate sets built successively. This is the idea of Danicic's algorithm shown as Algorithm 1. *Proof of Algorithm* 1*.* To establish the correction of the algorithm, we can prove that Wi, the value of W before iteration i + 1, satisfies both V ⊆ W<sup>i</sup> and W<sup>i</sup> ⊆ WCCG(V ) for any i by induction. If i = 0, W<sup>0</sup> = V , and both relations trivially hold. Let i be a natural number such that V ⊆ Wi, W<sup>i</sup> ⊆ WCCG(V ) and there exists a Wi-critical edge (u, v). We have Wi+1 = W<sup>i</sup> ∪ {u}. V ⊆ Wi+1 is straightforward. By Lemma 5, u ∈ WCCG(Wi). Therefore, by Lemma 3, u ∈ WCCG(V ), and thus, Wi+1 ⊆ WCCG(V ). At the end of the algorithm, there is no W-critical edge, therefore W is weakly control-closed by Lemma 5. Since V ⊆ W and W ⊆ WCCG(V ), W = WCCG(V ) by Lemma 3. Termination follows from the fact that W strictly increases in the loop and is upper-bounded by WCCG(V ). 

In terms of complexity, [11] shows that, assuming that the degree of each vertex is at most 2 (and thus that O(|V |) = O(|E|)), the complexity of the algorithm is O(|V | <sup>3</sup>). Indeed, the main loop of Algorithm <sup>1</sup> is run at most <sup>O</sup>(|<sup>V</sup> <sup>|</sup>) times, and each loop body computes obs in O(|V |) for at most O(|V |) edges.

*Remark 4.* We propose two optimizations for Algorithm 1:

– at each step, consider all critical edges rather than only one;

– use the weaker definition of critical edge suggested in Remark 3.

*Example.* We can replay Algorithm 1 using the first optimization. This run corresponds to the steps shown in Fig. 2. Initially, W<sup>0</sup> = V <sup>0</sup> = {u1, u3}.


3. There is no W2-critical edge in G0.

The optimized version computes the weak control-closure of V <sup>0</sup> in G<sup>0</sup> in only 2 iterations instead of 4. This run also demonstrates that the algorithm is necessarily iterative: even when considering all V <sup>0</sup> -critical edges in the first step, u<sup>6</sup> is not detected before the second step.

#### **6 The Optimized Algorithm**

**Overview.** A potential source of inefficiency in Danicic's algorithm is the fact that no information is shared between the iterations. The observable sets are recomputed at each iteration since the target set changes. This is the reason why the first optimization proposed in Remark 4 is interesting, because it allows to work longer on the same set and thus to reuse the observable sets.

We propose now to go even further: to store some information about the paths in the graph and reuse it in the *following* iterations. The main idea of the proposed algorithm is to label each processed node u with a node v ∈ W observable from u in the resulting set W being progressively constructed by the algorithm. Labels survive through iterations and can be reused.

Unlike Danicic's algorithm, ours does not directly compute the weak controlclosure. It actually computes the set W = V ∪ WDG(V ). To obtain the closure WCCG(V ) = W ∩RG(V ), W is then simply filtered to keep only vertices reachable from V (cf. Property 4a).

In addition to speeding up the algorithm, the usage of labels brings another benefit: for each node of G, its label indicates its observable vertex in W (when it exists) at the end of the algorithm. Recall that since WDG(W) = ∅ (by Property 3), each node in the graph has at most one observable vertex in W.

One difficult point with this approach is that the labels of the nodes need to be refreshed with care at each iteration so that they remain up-to-date. Actually, our algorithm does not ensure that at each iteration the label of each node is an observable vertex from this node in W. This state is only ensured at the beginning and at the end of the algorithm. Meanwhile, some nodes are still in the worklist and some labels are wrong, but this does not prevent the algorithm from working.

**Informal Description.** Our algorithm is given a directed graph G and a subset of vertices V in G. It manipulates three objects: a set W which is equal to V initially, which grows during the algorithm and which at the end contains the result, V ∪WDG(V ); a partial mapping obs associating at most one label obs[u] to each node u in the graph, this label being a vertex in W reachable from this node (and which is the observable from u in V ∪WDG(V ) at the end); a worklist L of nodes of the closure not processed yet. Each iteration proceeds as follows. If the worklist is not empty, a vertex u is extracted from it. All the vertices that transitively precede vertex u in the graph and that are not hidden by vertices in W are labeled with u. During the propagation, nodes that are good candidates to be V -weakly deciding are accumulated. After the propagation, we filter them so that only true V -weakly deciding nodes are kept. Each of these vertices is associated to itself in obs, and is added to W and L. If L is not empty, a new iteration begins. Otherwise, W is equal to V ∪WDG(V ) and obs associates each node in the graph with its observable vertex in the closure (when it exists).

Note that each iteration consists in two steps: a complete backward propagation in the graph, which collects potential V -weakly deciding vertices, and a filtering step. The set of predecessors of the propagated node are thus filtered twice: once during the propagation and once afterwards. We can try to filter as much as possible in the first step or, at the opposite, to avoid filtering during the first step and do all the work in the second step. For the sake of simplicity of mechanized proof, the version we chose does only simple filtering during the first step. We accumulate in our candidate V -weakly deciding nodes all nodes that have at least two children and a label different from the one currently propagated, and we eliminate the false positives in the second step, once the propagation is done.

*Example.* Let us use our running example (cf. Fig. 1) to illustrate the algorithm. The successive steps are represented in Fig. 3. In the different figures, nodes in W already processed (that is, in W\L) are represented using a solid double circle ( <sup>u</sup><sup>i</sup> ), while nodes in W not already processed (that is, still in worklist L) are represented using a dashed double circle ( <sup>u</sup><sup>i</sup> ). A label u<sup>j</sup> next to a node u<sup>i</sup>

**Fig. 3.** The optimized algorithm applied on G0, where V -<sup>=</sup> {u1, u3}

( <sup>u</sup><sup>i</sup> <sup>u</sup><sup>j</sup> ) means that u<sup>j</sup> is associated to ui, i.e. obs[ui] = u<sup>j</sup> . Let us detail the first steps of the algorithm. Initially, W<sup>0</sup> = V <sup>0</sup> = {u1, u3} (cf. Fig. 1).


As all nodes in W<sup>6</sup> are already reachable from V <sup>0</sup> , W<sup>6</sup> = WCCG(V 0 ).

We can make two remarks on this example. First, as we can see in Fig. 3f, each node is labeled with its observable in W at the end of the algorithm. Second, in Fig. 3e, we have the case of a node labeled with an obsolete label, since u<sup>5</sup> is labeled u<sup>4</sup> while its only observable node in W is u6.

**Detailed Description.** Our algorithm is split into three functions:

– confirm is used to check if a given node is V -weakly deciding by trying to find a child with a different label from its own label given as an argument.

**Input**: G = (V,E) a directed graph obs : Map(V,V ) associating at most one label to each vertex of G u, v <sup>∈</sup> <sup>V</sup> vertices in <sup>G</sup>

**Output**: b : bool

**Ensures**: <sup>b</sup> <sup>=</sup> true ⇐⇒ ∃u- , (u, u- ) <sup>∈</sup> <sup>E</sup> <sup>∧</sup> <sup>u</sup>- <sup>∈</sup> obs <sup>∧</sup> obs[u- ] = v **Algorithm 2.** Contract of confirm (G, obs, u, v)

**Input**: <sup>G</sup> = (V,E), <sup>W</sup> <sup>⊆</sup> <sup>V</sup> , obs : Map(V,V ), u, v <sup>∈</sup> <sup>V</sup> **Output**: obs- , a new version of obs <sup>C</sup> <sup>⊆</sup> <sup>V</sup> containing candidate <sup>W</sup>-weakly deciding nodes **Requires**: (**P1**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> V, obs[z] = <sup>v</sup> ⇐⇒ <sup>z</sup> <sup>=</sup> <sup>u</sup> **Requires**: (**P2**) <sup>u</sup> <sup>∈</sup> <sup>W</sup> **Ensures**: (**Q1**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> V,z <sup>W</sup>−path −−−−−→ <sup>u</sup> <sup>=</sup><sup>⇒</sup> obs- [z] = v **Ensures**: (**Q2**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> V, <sup>¬</sup>(<sup>z</sup> <sup>W</sup>−path −−−−−→ <sup>u</sup>) =<sup>⇒</sup> obs- [z] = obs[z] **Ensures**: (**Q3**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> C, z <sup>=</sup> <sup>u</sup> <sup>∧</sup> <sup>z</sup> <sup>W</sup>−path −−−−−→ <sup>u</sup> **Ensures**: (**Q4**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> V,z <sup>=</sup> <sup>u</sup> <sup>∧</sup> <sup>z</sup> <sup>W</sup>−path −−−−−→ <sup>u</sup> <sup>∧</sup> <sup>z</sup> <sup>∈</sup> obs ∧|succG(z)<sup>|</sup> <sup>&</sup>gt; 1 =<sup>⇒</sup> <sup>z</sup> <sup>∈</sup> <sup>C</sup> **Algorithm 3.** Contract of propagate (G, W, obs, u, v)


*Function Confirm.* A call to confirm(G, obs, u, v) takes four arguments: a graph G, a labeling of graph vertices obs, and two vertices u and v. It returns true if and only if at least one child u of u in G has a label in obs different from v, which can be written u ∈ obs ∧ obs[u ] = v. This simple function is left abstract here for lack of space. The Why3 formalization [17] contains a complete proof. Its contract is given as Algorithm 2.

*Function Propagate.* A call to propagate(G, W, obs, u, v) takes five arguments: a graph G, a subset W of nodes of G, a labeling of nodes obs, and two vertices u and v. It traverses G backwards from u (stopping at nodes in W) and updates obs so that all predecessors not hidden by vertices in W have label v at the end of the function. It returns a set of potential V -weakly deciding vertices. Again, this function is left abstract here but is proved in the Why3 development [17]. Its contract is given as Algorithm 3.

propagate requires that, when called, only u is labeled with v (P1) and that u ∈ W (P2). It ensures that, after the call, all the predecessors of u not hidden by a vertex in W are labeled v (Q1), the labels of the other nodes are unchanged (Q2), C contains only predecessors of u but not u itself (Q3), and all the predecessors that had a label before the call (different from v due to P1) and that have at least two children are in C (Q4).

**Input**: G = (V,E), a directed graph V - <sup>⊆</sup> <sup>V</sup> , the input subset **Output**: <sup>W</sup> <sup>⊆</sup> <sup>V</sup> , the main result obs : Map(V,V ), the final labeling **Variables**: <sup>L</sup> <sup>⊆</sup> <sup>V</sup> , a worklist of nodes to be treated <sup>C</sup> <sup>⊆</sup> <sup>V</sup> , a set of candidate <sup>V</sup> - -weakly deciding vertices <sup>Δ</sup> <sup>⊆</sup> <sup>V</sup> , a set of new <sup>V</sup> - -weakly deciding vertices **Ensures**: W = V - <sup>∪</sup> WDG(<sup>V</sup> - ) **Ensures**: <sup>∀</sup>u, v <sup>∈</sup> V, obs[u] = <sup>v</sup> ⇐⇒ <sup>v</sup> <sup>∈</sup> obsG(u, W) **1 begin <sup>2</sup>** <sup>W</sup> <sup>←</sup> <sup>V</sup> - ; obs|<sup>V</sup> - <sup>←</sup> id<sup>V</sup> - ; <sup>L</sup> <sup>←</sup> <sup>V</sup> - // initialization **<sup>3</sup> while** <sup>L</sup> = ∅ **do** // main loop // invariant: **I<sup>1</sup>** ∧ **I<sup>2</sup>** ∧ **I<sup>3</sup>** ∧ **I<sup>4</sup>** ∧ **I<sup>5</sup>** ∧ **I<sup>6</sup>** // variant: cardinal(<sup>L</sup> <sup>∪</sup> <sup>V</sup> \ <sup>W</sup>) **<sup>4</sup>** <sup>u</sup> <sup>←</sup> choose(L) ; <sup>L</sup> <sup>←</sup> <sup>L</sup> \ {u} **<sup>5</sup>** <sup>C</sup> <sup>←</sup> propagate (G, W, obs, u, u) // propagation **<sup>6</sup>** <sup>Δ</sup> <sup>←</sup> <sup>∅</sup> **<sup>7</sup> while** <sup>C</sup> = ∅ **do** // filtering **<sup>8</sup>** <sup>v</sup> <sup>←</sup> choose(C) ; <sup>C</sup> <sup>←</sup> <sup>C</sup> \ {v} **<sup>9</sup> if** confirm (G, obs, v, u) = true **then** <sup>Δ</sup> <sup>←</sup> <sup>Δ</sup> ∪ {v} **10 end <sup>11</sup>** <sup>W</sup> <sup>←</sup> <sup>W</sup> <sup>∪</sup> <sup>Δ</sup> ; obs|<sup>Δ</sup> <sup>←</sup> id<sup>Δ</sup> ; <sup>L</sup> <sup>←</sup> <sup>L</sup> <sup>∪</sup> <sup>Δ</sup> // update **12 end** // assert: **A<sup>1</sup>** ∧ **A<sup>2</sup>** ∧ **A<sup>3</sup>** ∧ **A<sup>4</sup> <sup>13</sup> return** (W, obs) **14 end** (**I1**) <sup>∀</sup><sup>z</sup> <sup>∈</sup> W, obs[z] = <sup>z</sup> (**I2**) <sup>∀</sup>y, z <sup>∈</sup> V, obs[y] = <sup>z</sup> <sup>=</sup><sup>⇒</sup> <sup>z</sup> <sup>∈</sup> <sup>W</sup> (**I3**) <sup>∀</sup>y, z <sup>∈</sup> V, obs[y] = <sup>z</sup> <sup>∧</sup> <sup>z</sup> <sup>∈</sup> <sup>L</sup> <sup>=</sup><sup>⇒</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup> (**I4**) <sup>∀</sup>y, z <sup>∈</sup> V, obs[y] = <sup>z</sup> <sup>=</sup><sup>⇒</sup> <sup>y</sup> path −−−→ <sup>z</sup> (**I5**) V - <sup>⊆</sup> <sup>W</sup> <sup>⊆</sup> <sup>V</sup> - <sup>∪</sup> WDG(<sup>V</sup> - ) (**I6**) <sup>∀</sup>y, z, z- <sup>∈</sup> V, y <sup>W</sup>−disjoint −−−−−−−−→ <sup>z</sup> <sup>∧</sup> obs[z] = <sup>z</sup>- ∧z- <sup>∈</sup> <sup>L</sup> <sup>=</sup><sup>⇒</sup> obs[y] = <sup>z</sup>- (**A1**) <sup>∀</sup>u, v <sup>∈</sup> V,v <sup>∈</sup> obsG(u, W) <sup>=</sup><sup>⇒</sup> obs[u] = <sup>v</sup> (**A2**) WDG(W) = ∅ (**A3**) V - <sup>⊆</sup> <sup>W</sup> <sup>⊆</sup> <sup>V</sup> - <sup>∪</sup> WDG(<sup>V</sup> - ) (**A4**) W = V - <sup>∪</sup> WDG(<sup>V</sup> - ) **Algorithm 4.** Function main with annotations

*Function Main.* The main function of our algorithm is given as Algorithm 4. It takes two arguments: a graph G and a subset of vertices V . It returns V ∪ WDG(V ) and a labeling associating to each node its observable vertex in this set if it exists. It maintains a worklist L of vertices that must be processed. L is initially set to V , and their labels to themselves (line 2). If L is not empty, a node u is taken from it and propagate(G, W, obs, u, u) is called (lines 3–5). It returns a set of candidate V -weakly deciding nodes (C) that are not added to W yet. They are first filtered using confirm (lines 6–10). The confirmed nodes (Δ) are then added to W and to L, and the label of each of them is updated to itself (line 11). The iterations stop when L is empty (cf. lines 3, 13).

**Proof of the Optimized Algorithm.** We opted for Why3 instead of Coq for this proof to take advantage of Why3's automation. Indeed, most of the goals could be discharged in less than a minute using Alt-Ergo, CVC4, Z3 and E. Some of them still needed to be proved manually in Coq, resulting in 330 lines of Coq proof. The Why3 development [17] focuses on the proof of the algorithm, not on the concepts presented in Sects. 3 and 4. Most of the concepts are proved, one of them is assumed in Why3 but was proved in Coq previously. Due to lack of space, we detail here only the main invariants necessary to prove main (cf. Algorithm 4). The proofs of I1, I2, I3, I<sup>4</sup> are rather simple. while those of I<sup>5</sup> and I<sup>6</sup> are more complex.

I<sup>1</sup> states that each node in W has itself as a label. It is true initially for all nodes in V and is preserved by the updates.

I<sup>2</sup> states that all labels are in W. This is true initially since all labels are in V . The preservation is verified, since all updates are realized using labels in W.

I<sup>3</sup> states that labels in L have not been already propagated. Given a node y in L, y is the only node whose label is y. It is true initially since every vertex in V has itself as a label. After an update, the new nodes obey the same rule, so I<sup>3</sup> is preserved.

I<sup>4</sup> states that if label z is associated to a node y then there exists a path between y and z. Initially, there exist trivial paths from each node in V to itself. When obs is updated, there exists a W-path, thus in particular a path.

I<sup>5</sup> states that W remains between V and V ∪WDG(V ) during the execution of the algorithm. The first part V ⊆ W is easy to prove, because it is true initially and W is growing. For the second part, we need to prove that after the filtering, Δ ⊆ WDG(V ). For that, we will prove that Δ ⊆ WDG(W) thanks to Lemma 3. Let v be a node in Δ. Since Δ ⊆ C, we know that v ∈ W and u ∈ obsG(v,W). Moreover, we have confirm(G, obs, v, u) = true, i.e. v has a child v such that v ∈ obs, hence v can reach W by I4, and obs[v ] = u, hence u ∈ obsG(v , W). We can apply Lemma 4 and deduce that v ∈ WDG(W).

I<sup>6</sup> is the most complicated invariant. I<sup>6</sup> states that if there is a path between two vertices y and z that does not intersect W, and z has a label already processed, then y and z have the same label. Let us give a sketch of the proof of preservation of I<sup>6</sup> after an iteration of the main loop. Let us note obs the map at the end of the iteration. Let y, z, z ∈ V such that y (W∪Δ)−disjoint −−−−−−−−−−−→ <sup>z</sup>, obs [z] = z and z ∈ (L \ {u}) ∪ Δ. Let us show that obs [y] = z . First, observe that neither y nor z can be in Δ, otherwise z would be in Δ, which would be contradictory. We examine four cases depending on whether the conditions z <sup>W</sup>−path −−−−−→ <sup>u</sup> (H1) and <sup>y</sup> <sup>W</sup>−path −−−−−→ <sup>u</sup> (H2) hold.


path connecting y and z which is also the origin of a W-path to u, and v<sup>2</sup> as its successor on this (W ∪ Δ)-disjoint path. We can show that v<sup>1</sup> ∈ Δ, which contradicts the fact that it lives on a (W ∪ Δ)-disjoint path.

We can now prove the assertions A1, A2, A<sup>3</sup> and A<sup>4</sup> at the end of main. A<sup>1</sup> is a direct consequence of I<sup>6</sup> since at the end L = ∅. A<sup>1</sup> implies that each vertex u has at most one observable in W: obs[u] if u ∈ obs. A W-weakly deciding vertex would have two observables, thus WDG(W) = ∅. A<sup>3</sup> is a direct consequence of I5. A<sup>4</sup> can be deduced from A<sup>2</sup> and Lemma 3 applied to A3. This proves that at the end W = V ∪ WDG(V ). To prove the other post-condition, we must prove that if there are two nodes u, v such that obs[u] = v, then v ∈ obsG(u, W). By I4, there is a path from u to v. Let w be the first element in W on this path. Then u <sup>W</sup>−path −−−−−→ <sup>w</sup>. By <sup>A</sup>1, obs[u] = <sup>w</sup>. Thus, <sup>w</sup> <sup>=</sup> <sup>v</sup> and <sup>u</sup> <sup>W</sup>−path −−−−−→ <sup>v</sup>. This proves the second post-condition. 

#### **7 Experiments**

We have implemented Danicic's algorithm (additionally improved by the two optimizations proposed in Remark 4) and ours in OCaml [17] using the OCamlgraph library [9], taking care to add a filtering step at the end of our algorithm to preserve only nodes reachable from the initial subset. To be confident in their correctness, we have tested both implementations on small examples w.r.t. a certified but slow Coqextracted implementation as an oracle. We have also carefully checked that the results returned by both implementations were the same in all experiments.

**Fig. 4.** Danicic's vs. our algorithm

We have experimentally evaluated both implementations on thousands of random graphs with up to thousands of vertices, generated by OCamlgraph. For every number of vertices between 10 and 1000 (resp. 6500) that is a multiple of 10, we generate 10 graphs with twice as many edges as vertices and randomly select three vertices to form the initial subset V and run both algorithms (resp. only our algorithm) on them. Although the initial subsets are small, the resulting closures nearly always represent a significant part of the set of vertices of the graph. To avoid the trivial case, we have discarded the examples where the closure is restricted to the initial subset itself (where execution time is insignificant), and computed the average time of the remaining tests. Results are presented in Fig. 4. Experiments have been performed on an Intel Core i7 4810MQ with 8 cores at 2.80 GHz and 16 GB RAM.

We observe that Danicic's algorithm explodes for a few hundreds of vertices, while our algorithm remains efficient for graphs with thousands of nodes.

## **8 Related Work and Conclusion**

**Related Work.** The last decades have seen various definitions of control dependence given for larger and larger classes of programs [6,12,13,21,22,27]. To consider programs with exceptions and potentially infinite loops, Ranganath et al. [23] and then Amtoft [2] introduced non-termination sensitive and nontermination insensitive control dependence on arbitrary program structures. Danicic et al. [11] further generalized control dependence to arbitrary directed graphs, by defining weak and strong control-closure, which subsume the previous non-termination insensitive and sensitive control dependence relations. They also gave a control dependence semantics in terms of projections of paths in the graph, allowing to define new control dependence relations as long as they are compatible with it. This elegant framework was reused for slicing extended finite state machines [3] and probabilistic programs [4]. In both works, an algorithm computing weak control-closure, working differently from ours, was designed and integrated in a rather efficient slicing algorithm.

While there exist efficient algorithms to compute the dominator tree in a graph [8,10,16,19], and even certified ones [15], and thus efficient algorithms computing control dependence when defined in terms of post-dominance, algorithms in the general case [2,11,23] are at least cubic.

Mechanized verification of control dependence computation was done in formalizations of program slicing. Wasserrab [26] formalized language-independent slicing in Isabelle/HOL, but did not provide an algorithm. Blazy et al. [7] and our previous work [18] formalized control dependence in Coq, respectively, for an intermediate language of the CompCert C compiler [20] and on a WHILE language with possible errors.

**Conclusion and Future Work.** Danicic et al. claim that weak control-closure subsumes all other non-termination insensitive variants. It was thus a natural candidate for mechanized formalization. We used the Coq proof assistant to formalize it. A certified implementation of the algorithm can be extracted from the Coq development. During formalization in Coq of the algorithm and its proof, we have detected an inconsistency in a secondary proof, which highlights how useful proof assistants are to detect otherwise overlooked cases. To the best of our knowledge, the present work is the first mechanized formalization of weak control-closure and of an algorithm to compute it. In addition to formalizing Danicic's algorithm in Coq, we have designed, formalized and proved a new one, that is experimentally shown to be faster than the original one. Short-term future work includes considering further optimizations. Long-term future work is to build a verified generic slicer. Indeed, generic control dependence is a first step towards it. Adding data dependence is the next step in this direction.

**Acknowledgements.** We thank the anonymous reviewers for helpful suggestions.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Specification and Program Testing

# **Iterative Generation of Diverse Models for Testing Specifications of DSL Tools**

Oszk´ar Semer´ath1,2(B) and D´aniel Varr´o1,2,3

<sup>1</sup> MTA-BME Lend¨ulet Cyber-Physical Systems Research Group, Budapest, Hungary

*{*semerath,varro*}*@mit.bme.hu <sup>2</sup> Department of Measurement and Information Systems,

Budapest University of Technology and Economics, Budapest, Hungary

<sup>3</sup> Department of Electrical and Computer Engineering, McGill University, Montreal, Canada

**Abstract.** The validation of modeling tools of custom domain-specific languages (DSLs) frequently relies upon an automatically generated set of models as a test suite. While many software testing approaches recommend that this test suite should be diverse, model diversity has not been studied systematically for graph models. In the paper, we propose diversity metrics for models by exploiting neighborhood shapes as abstraction. Furthermore, we propose an iterative model generation technique to synthesize a diverse set of models where each model is taken from a different equivalence class as defined by neighborhood shapes. We evaluate our diversity metrics in the context of mutation testing for an industrial DSL and compare our model generation technique with the popular model generator Alloy.

## **1 Introduction**

**Motivation.** Domain-Specific Language (DSL) based modeling tools are gaining an increasing role in the software development processes. Advanced DSL frameworks such as Xtext, or Sirius built on top of model management frameworks such as Eclipse Modeling Framework (EMF) [37] significantly improve productivity of domain experts by automating the production of rich editor features.

Modelling environments may provide validation for the system under design from an early stage of development with efficient tool support for checking wellformedness (WF) constraints and design rules over large model instances of the DSL using tools like Eclipse OCL [24] or graph queries [41]. Model generation techniques [16,19,35,39] are able to automatically provide a range of solution candidates for allocation problems [19], model refactoring or context generation [21]. Finally, models can be processed by query-based transformations or code generators to automatically synthesize source code or other artifacts.

The design of complex DSLs tools is a challenging task. As the complexity of DSL tools increases, special attention is needed to validate the modeling tools themselves (e.g. for tool qualification purposes) to ensure that WF constraints and the preconditions of model transformation and code generation functionality [4,32,35] are correctly implemented in the tool.

**Problem Statement.** There are many approaches aiming to address the testing of DSL tools (or transformations) [1,6,42] which necessitate *the automated synthesis of graph models* to serve as test inputs. Many best practices of testing (such as equivalence partitioning [26], mutation testing [18]) recommends the synthesis of *diverse* graph models where any pairs of models are structurally different from each other to achieve high coverage or a diverse solution space.

While software diversity is widely studied [5], existing diversity metrics for graph models are much less elaborate [43]. Model comparison techniques [38] frequently rely upon the existence of node identifiers, which can easily lead to many isomorphic models. Moreover, checking graph isomorphism is computationally very costly. Therefore practical solutions tend to use approximate techniques to achieve certain diversity by random sampling [17], incremental generation [19,35], or using symmetry breaking predicates [39]. Unlike equivalence partitions which capture diversity of inputs in a customizable way for testing traditional software, a similar diversity concept is still missing for graph models.

**Contribution.** In this paper, we propose *diversity metrics* to characterize a single model and a set of models. For that purpose, we innovatively reuse neighborhood graph shapes [28], which provide a fine-grained typing for each object based on the structure (e.g. incoming and outgoing edges) of its neighborhood. Moreover, we propose an *iterative model generation technique* to automatically synthesize a diverse set of models for a DSL where each model is taken from a different equivalence class wrt. graph shapes as an equivalence relation.

We evaluate our diversity metrics and model generator in the context of mutation-based testing [22] of WF constraints in an industrial DSL tool. We evaluate and compare the *mutation score* and *our diversity metrics* of test suites obtained by (1) an Alloy based model generator (using symmetry breaking predicates to ensure diversity), (2) an iterative graph solver based generator using neighborhood shapes, and (3) from real models created by humans. Our finding is that a diverse set of models derived along different neighborhood shapes has better mutation score. Furthermore, based on a test suite with 4850 models, we found that high correlation between mutation score and our diversity metrics, which indicates that our metrics may be good predictors in practice for testing.

**Added Value.** Up to our best knowledge, our paper is one of the first studies on (software) model diversity. From a testing perspective, our diversity metrics provide a stronger characterization of a test suite of models than traditional metamodel coverage which is used in many research papers. Furthermore, model generators using neighborhood graph shapes (that keep models only if they are surely non-isomorphic) provide increased diversity compared to symmetry breaking predicates (which exclude models if they are surely isomorphic).

## **2 Preliminaries**

Core modeling concepts and testing challenges of DSL tools will be illustrated in the context of Yakindu Statecharts [46], which is an industrial DSL for developing reactive, event-driven systems, and supports validation and code generation.

#### **2.1 Metamodels and Instance Models**

Metamodels define the main concepts, relations and attributes of a domain to specify the basic graph structure of models. A simplified metamodel for Yakindu state machines is illustrated in Fig. 1 using the popular Eclipse Modeling Framework (EMF) [37] is used for domain modeling. A state machine consists of Regions, which in turn contain states (called Vertexes) and Transitions. An abstract state Vertex is further refined into RegularStates (like State or FinalState) and PseudoStates (like Entry, Exit or Choice).

**Fig. 1.** Metamodel extract from Yakindu state machines

Formally [32,34], a metamodel defines a vocabulary of type and relation symbols <sup>Σ</sup> <sup>=</sup> {C<sup>1</sup>,..., <sup>C</sup>*<sup>n</sup>*, <sup>R</sup><sup>1</sup>,..., <sup>R</sup>*m*} where a unary predicate symbol <sup>C</sup>*<sup>i</sup>* is defined for each *EClass*, and a binary predicate symbol R*<sup>j</sup>* is derived for each *EReference*. For space considerations, we omit the precise handling of attributes.

An *instance model* can be represented as a logic structure M <sup>=</sup> -*Obj <sup>M</sup>*, <sup>I</sup>*M* where *Obj <sup>M</sup>* is the finite set of objects (the size of the model is <sup>|</sup>M<sup>|</sup> <sup>=</sup> <sup>|</sup>*Obj <sup>M</sup>*|), and <sup>I</sup>*<sup>M</sup>* provides interpretation for all predicate symbols in <sup>Σ</sup> as follows:


A metamodel also specifies extra structural constraints (type hierarchy, multiplicities, etc.) that need to be satisfied in each valid instance model [32].

*Example 1.* Figure 2 shows graph representations of three (partial) instance models. For the sake of clarity, Regions and inverse relations incomingTransitions and outgoingTransitions are excluded from the diagram. In <sup>M</sup><sup>1</sup> there are two States (s1 and s2), which are connected to a loop via Transition<sup>s</sup> t2 and t3. The initial state is marked by a Transition <sup>t</sup>1 from an entry <sup>e</sup>1 to state <sup>s</sup>1. <sup>M</sup><sup>2</sup> describes a similar statechart with three states in loop (s3, s4 and s5 connected via <sup>t</sup>5, <sup>t</sup>6 and <sup>t</sup>7). Finally, in <sup>M</sup><sup>3</sup> there are two main differences: there is an incoming Transition t11 to an Entry state (e3), and there is a State s7 that does not have outgoing transition. While all these M1 and M2 are non-isomorphic, later we illustrate why they are not diverse.

**Fig. 2.** Example instance models (as directed graphs)

$$\begin{array}{c} \left[\mathbb{C}(v)\right]\_{Z}^{M} := \mathcal{Z}\_{M}(\mathbb{C})(Z(v)) \quad \left[\varphi\_{1} \wedge \varphi\_{2}\right]\_{Z}^{M} := \left[\varphi\_{1}\right]\_{Z}^{M} \wedge \left[\varphi\_{2}\right]\_{Z}^{M} \\ \left[\mathbb{R}(v\_{1},v\_{2})\right]\_{Z}^{M} := \mathcal{Z}\_{M}(\mathbb{R})(Z(v\_{1}),Z(v\_{2})) \left[\varphi\_{1} \vee \varphi\_{2}\right]\_{Z}^{M} := \left[\varphi\_{1}\right]\_{Z}^{M} \vee \left[\varphi\_{2}\right]\_{Z}^{M} \\ \left[v\_{1} = v\_{2}\right]\_{Z}^{M} := Z(v\_{1}) = Z(v\_{2}) \quad \left[\neg\varphi\right]\_{Z}^{M} := \neg\|\varphi\|\_{Z}^{M} \\ \left[\forall v:\varphi\right]\_{Z}^{M} := \bigwedge\_{x \in Ob\_{j}} \left[\varphi\right]\_{Z,v \mapsto x}^{M} \quad \left[\exists v:\varphi\right]\_{Z}^{M} := \bigvee\_{x \in Ob\_{j}} \left[\varphi\right]\_{Z,v \mapsto x}^{M} \end{array}$$

**Fig. 3.** Inductive semantics of graph predicates

#### **2.2 Well-Formedness Constraints as Logic Formulae**

In many industrial modeling tools, WF constraints are captured either by OCL constraints [24] or graph patterns (GP) [41] where the latter captures structural conditions over an instance model as paths in a graph. To have a unified and precise handling of evaluating WF constraints, we use a tool-independent logic representation (which was influenced by [29,32,34]) that covers the key features of concrete graph pattern languages and a first-order fragment of OCL.

**Syntax.** A graph predicate is a first order logic predicate ϕ(v<sup>1</sup>,...v*<sup>n</sup>*) over (object) variables which can be inductively constructed by using class and relation predicates <sup>C</sup>(v) and <sup>R</sup>(v<sup>1</sup>, v<sup>2</sup>), equality check =, standard first order logic connectives ¬, ∨, ∧, and quantifiers ∃ and ∀.

**Semantics.** A graph predicate ϕ(v1,...,v*n*) can be evaluated on model M along a variable binding <sup>Z</sup> : {v1,...,v*n*} → *Obj <sup>M</sup>* from variables to objects in <sup>M</sup>. The truth value of ϕ can be evaluated over model M along the mapping Z (denoted by [[ϕ(v1,...,v*n*)]]*<sup>M</sup> <sup>Z</sup>* ) in accordance with the semantic rules defined in Fig. 3.

If there is a variable binding Z where the predicate ϕ is evaluated to 1 over M is often called a pattern match, formally [[ϕ]]*<sup>M</sup> <sup>Z</sup>* = 1. Otherwise, if there are no bindings Z to satisfy a predicate, i.e. [[ϕ]]*<sup>M</sup> <sup>Z</sup>* = 0 for all <sup>Z</sup>, then the predicate ϕ is evaluated to 0 over M. Graph query engines like [41] can retrieve (one or all) matches of a graph predicate over a model. When using graph patterns for validating WF constraints, a match of a pattern usually denotes a violation, thus the corresponding graph formula needs to capture the erroneous case.

#### **2.3 Motivation: Testing of DSL Tools**

A code generator would normally assume that the input models are well-formed, i.e. all WF constraints are validated prior to calling the code generator. However, there is no guarantee that the WF constraints actually checked by the DSL tool are exactly the same as the ones required by the code generator. For instance, if the validation forgets to check a subclause of a WF constraint, then runtime errors may occur during code generation. Moreover, the precondition of the transformation rule may also contain errors. For that purpose, WF constraints and model transformations of DSL tools can be systematically tested.Alternatively, model validation can be interpreted as a special case of model transformation, where precondition of the transformation rules are fault patterns, and the actions place error markers on the model [41].

A popular approach for testing DSL tools is mutation testing [22,36] which aims to reveal missing or extra predicates by (1) deriving a set of mutants (e.g. WF constraints in our case) by applying a set of mutation operators. Then (2) the test suite is executed for both the original and the mutant programs, and (3) their output are compared. (4) A mutant is killed by a test if different output is produced for the two cases (i.e. different match set). (5) The mutation score of a test suite is calculated as the ratio of mutants killed by some tests wrt. the total number of mutants. A test suite with better mutation score is preferred [18].

**Fault Model and Detection.** As a fault model, we consider omission faults in WF constraints of DSL tools where some subconstraints are not actually checked. In our fault model, a WF constraint is given in a conjunctive normal form <sup>ϕ</sup>*<sup>e</sup>* <sup>=</sup> <sup>ϕ</sup><sup>1</sup>∧···∧ϕ*<sup>k</sup>*, all unbound variables are quantified existentially (∃), and may refer to other predicates specified in the same form. Note that this format is equivalent to first order logic, and does not reduce the range of supported graph predicates. We assume that in a faulty predicate (a mutant) the developer may forget to check one of the predicates <sup>ϕ</sup>*<sup>i</sup>* (Constraint Omission, CO), i.e. <sup>ϕ</sup>*<sup>e</sup>* = [ϕ<sup>1</sup>∧...∧ϕ*<sup>i</sup>*∧...∧ϕ*<sup>k</sup>*] is rewritten to <sup>ϕ</sup>*<sup>f</sup>* = [ϕ<sup>1</sup>∧···∧ϕ*<sup>i</sup>*−<sup>1</sup>∧ϕ*<sup>i</sup>*+1∧···∧ϕ*<sup>k</sup>*], or may forgot a negation (Negation Omission), i.e. <sup>ϕ</sup>*<sup>e</sup>* = [ϕ<sup>1</sup>∧...∧(¬ϕ*<sup>i</sup>*)∧...∧ϕ*<sup>k</sup>*] is rewritten to <sup>ϕ</sup>*<sup>f</sup>* = [ϕ<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>ϕ</sup>*<sup>i</sup>* <sup>∧</sup> ... <sup>∧</sup> <sup>ϕ</sup>*<sup>k</sup>*]. Given an instance model <sup>M</sup>, we assume that both [[ϕ*<sup>e</sup>*]]*<sup>M</sup>* and the faulty [[ϕ*<sup>f</sup>* ]]*<sup>M</sup>* can be evaluated separately by the DSL tool. Now a test model M detects a fault if there is a variable binding Z, where the two evaluations differ, i.e. [[ϕ*e*]]*<sup>M</sup> Z* = [[ϕ*<sup>f</sup>* ]]*<sup>M</sup> Z* .

*Example 2.* Two WF constraints checked by the Yakindu environment can be captured by graph predicates as follows:


According to our fault model, we can derive two mutants for *incomingToEntry* as predicates <sup>ϕ</sup>*<sup>f</sup>*<sup>1</sup> := Entry(E) and <sup>ϕ</sup>*<sup>f</sup>*<sup>2</sup> := <sup>∃</sup><sup>t</sup> : target(T,E).

Constraints <sup>ϕ</sup> and <sup>φ</sup> are satisfied in model <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> as the corresponding graph predicates have no matches, thus [[ϕ]]*<sup>M</sup>*<sup>1</sup> *<sup>Z</sup>* = 0 and [[φ]]*<sup>M</sup>*<sup>1</sup> *<sup>Z</sup>* = 0. As a test model, both <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> is able to detect the same omission fault both for <sup>ϕ</sup>*<sup>f</sup>*<sup>1</sup> as [[ϕ*<sup>f</sup>*<sup>1</sup> ]]*<sup>M</sup>*<sup>1</sup> = 1 (with <sup>E</sup> <sup>→</sup> <sup>e</sup>1 and <sup>E</sup> <sup>→</sup> <sup>e</sup>2) and similarly <sup>ϕ</sup>*<sup>f</sup>*<sup>2</sup> (with <sup>s</sup>1 and <sup>s</sup>3). However, <sup>M</sup><sup>3</sup> is unable to kill mutant <sup>ϕ</sup>*<sup>f</sup>*<sup>1</sup> as (<sup>ϕ</sup> had a match <sup>E</sup> <sup>→</sup> <sup>e</sup>3 which remains in <sup>ϕ</sup>*<sup>f</sup>*<sup>1</sup> ), but able to detect others.

## **3 Model Diversity Metrics for Testing DSL Tools**

As a general best practice in testing, a good test suite should be diverse, but the interpretation of diversity may differ. For example, equivalence partitioning [26] partitions the input space of a program into equivalence classes based on observable output, and then select the different test cases of a test suite from different execution classes to achieve a diverse test suite. However, while software diversity has been studied extensively [5], model diversity is much less covered.

In existing approaches [6,7,9,10,31,42] for testing DSL and transformation tools, a test suite should provide full *metamodel coverage* [45], and it should also guarantee that any pairs of models in the test suite are non-isomorphic [17,39]. In [43], the diversity of a model <sup>M</sup>*<sup>i</sup>* is defined as the number of (direct) types used from its *MM* , i.e. <sup>M</sup>*<sup>i</sup>* is more diverse than <sup>M</sup>*<sup>j</sup>* if more types of MM are used in <sup>M</sup>*<sup>i</sup>* than in <sup>M</sup>*<sup>j</sup>* . Furthermore, a model generator Gen deriving a set of models {M*<sup>i</sup>*} is diverse if there is a designated distance between each pairs of models <sup>M</sup>*<sup>i</sup>* and <sup>M</sup>*<sup>j</sup>* : *dist*(M*<sup>i</sup>*, M*<sup>j</sup>* ) > D, but no concrete distance function is proposed.

Below, we propose diversity metrics for a single model, for pairs of models and for a set of models based on neighborhood shapes [28], a formal concept known from the state space exploration of graph transformation systems [27]. Our diversity metrics generalize both metamodel coverage and (graph) isomorphism tests, which are derived as two extremes of the proposed metric, and thus it defines a finer grained equivalence partitioning technique for graph models.

#### **3.1 Neighborhood Shapes of Graphs**

A neighborhood *Nbh<sup>i</sup>* describes the local properties of an object in a graph model for a range of size i <sup>∈</sup> <sup>N</sup> [28]. The neighbourhood of an object o describes all unary (class) and binary (reference) relations of the objects within the given range. Informally, neighbourhoods can be interpreted as richer types, where the original classes are split into multiple subclasses based on the difference in the incoming and outgoing references. Formally, neighborhood descriptors are defined recursively with the set of class and reference symbols Σ:


Shaping function *nbh<sup>i</sup>* : *Obj <sup>M</sup>* <sup>→</sup> *Nbh<sup>i</sup>* maps each object in a model <sup>M</sup> to a neighborhood with range i: (1) if i = 0, then *nbh*0(o) = {C|[[C(o)]]*<sup>M</sup>* = 1}; (2) if i > 0, then *nbhi*(o) = *nbh<sup>i</sup>*−<sup>1</sup>(o), *in*, *out*, where

$$\begin{aligned} in &= \{ \langle \mathbb{R}, n \rangle | \exists o' \in Obj\_M : \left[ \mathbb{R}(o', o) \right]^M \land n = nbh\_{i-1}(o') \} \\ out &= \{ \langle \mathbb{R}, n \rangle | \exists o' \in Obj\_M : \left[ \mathbb{R}(o, o') \right]^M \land n = nbh\_{i-1}(o') \} \end{aligned}$$

<sup>A</sup> *(graph) shape* of a model <sup>M</sup> for range <sup>i</sup> (denoted as <sup>S</sup>*<sup>i</sup>*(M)) is a set of neighborhood descriptors of the model: <sup>S</sup>*<sup>i</sup>*(M) = {x|∃<sup>o</sup> <sup>∈</sup> *Obj <sup>M</sup>* : *nbhi*(o) = <sup>x</sup>}. A shape can be interpreted and illustrated as a as a type graph: after calculating the neighborhood for each object, each neighborhood is represented as a node in the graph shape. Moreover, if there exist at least one link between objects in two different neighborhoods, the corresponding nodes in the shape will be connected by an edge. We will use the size of a shape <sup>|</sup>S*<sup>i</sup>*(M)<sup>|</sup> which is the number of shapes used in M.

*Example 3.* We illustrate the concept of graph shapes for model <sup>M</sup><sup>1</sup>. For range 0, objects are mapped to class names as neighborhood descriptors:

```
– nbh0(e) = {Entry, PseudoState, Vertex}
```

For range 1, objects with different incoming or outgoing types are further split, e.g. the neighborhood of t1 is different from that of t2 and t3 as it is connected to an Entry along a source reference, while the source of t2 and t3 are States.


For range 2, each object of <sup>M</sup><sup>1</sup> would be mapped to a unique element. In Fig. 4, the neighborhood shapes of models <sup>M</sup><sup>1</sup>, <sup>M</sup><sup>2</sup>, and <sup>M</sup><sup>3</sup> for range 1, are represented in a visual notation adapted from [28,29] (without additional annotations e.g. multiplicities or predicates used for verification purposes). The trace of the

**Fig. 4.** Sample neighborhood shapes of *M*1, *M*<sup>2</sup> and *M*<sup>3</sup>

concrete graph nodes to neighbourhood is illustrated on the right. For instance, <sup>e</sup>1 and <sup>e</sup>2 in <sup>M</sup>1 and <sup>M</sup><sup>2</sup> Entries are both mapped to the same neighbourhood n1, while e3 can be distinguished from them as it has incoming reference from a transition, thus creating a different neighbourhood n5.

**Properties of Graph Shapes.** The theoretical foundations of graph shapes [28,29] prove several key semantic properties which are exploited in this paper:

P1 There are only a *finite number of graph shapes in a certain range*, and a smaller range reduces the number of graph shapes, i.e. <sup>|</sup>S*<sup>i</sup>*(M)|≤|S*<sup>i</sup>*+1(M)|. P2 <sup>|</sup>S*<sup>i</sup>*(M*<sup>j</sup>* )<sup>|</sup> <sup>+</sup> <sup>|</sup>S*<sup>i</sup>*(M*<sup>k</sup>*)|≥|S*<sup>i</sup>*(M*<sup>j</sup>* <sup>∪</sup> <sup>M</sup>*<sup>k</sup>*)|≥|S*<sup>i</sup>*(M*<sup>j</sup>* )<sup>|</sup> and <sup>|</sup>S*<sup>i</sup>*(M*<sup>k</sup>*)|.

#### **3.2 Metrics for Model Diversity**

We define two metrics for model diversity based upon neighborhood shapes. *Internal diversity* captures the diversity of a single model, i.e. it can be evaluated individually for each and every generated model. As neighborhood shapes introduce extra subtypes for objects, this model diversity metric measures the number of neighborhood types used in the model with respect to the size of the model. *External diversity* captures the distance between pairs of models. Informally, this diversity distance between two models will be proportional to the number of different neighborhoods covered in one model but not the other.

**Definition 1 (Internal model diversity).** *For a range* i *of neighborhood shapes for model* M*, the internal diversity of* M *is the number of shapes wrt. the size of the model:* d*int <sup>i</sup>* (M) = <sup>|</sup>S*<sup>i</sup>*(M)|/|M|*.*

The range of this internal diversity metric d*int <sup>i</sup>* (M) is [0..1], and a model <sup>M</sup> with d*int* <sup>1</sup> (M) = 1 (and <sup>|</sup>M|≥|MM|) *guarantees full metamodel coverage* [45], i.e. it surely contains all elements from a metamodel as types. As such, it is an appropriate diversity metric for a model in the sense of [43]. Furthermore, given a specific range i, the number of potential neighborhood shapes within that range is finite, but it grows superexponentially. Therefore, for a small range <sup>i</sup>, one can derive a model <sup>M</sup>*<sup>j</sup>* with <sup>d</sup>*int <sup>i</sup>* (M*<sup>j</sup>* ) = 1, but for larger models <sup>M</sup>*<sup>k</sup>* (with <sup>|</sup>M*<sup>k</sup>*<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>M*<sup>j</sup>* <sup>|</sup>) we will likely have <sup>d</sup>*int <sup>i</sup>* (M*<sup>j</sup>* ) <sup>≥</sup> <sup>d</sup>*int <sup>i</sup>* (M*<sup>k</sup>*). However, due to the rapid growth of the number of shapes for increasing range i, for most practical cases, d*int <sup>i</sup>* (M*<sup>j</sup>* ) will converge to 1 if <sup>M</sup>*<sup>j</sup>* is sufficiently diverse.

**Definition 2 (External model diversity).** *Given a range* i *of neighborhood shapes, the external diversity of models* <sup>M</sup>*<sup>j</sup> and* <sup>M</sup>*<sup>k</sup> is the number of shapes contained exclusively in* <sup>M</sup>*<sup>j</sup> or* <sup>M</sup>*<sup>k</sup> but not in the other, formally,* <sup>d</sup>*ext <sup>i</sup>* (M*<sup>j</sup>* , M*k*) = <sup>|</sup>S*i*(M*<sup>j</sup>* ) <sup>⊕</sup> <sup>S</sup>*i*(M*k*)<sup>|</sup> *where* <sup>⊕</sup> *denotes the symmetric difference of two sets.*

External model diversity allows to compare two models. One can show that this metric is a (pseudo)-distance in the mathematical sense [2], and thus, it can serve as a diversity metric for a model generator in accordance with [43].

**Definition 3 (Pseudo-distance).** *A function* d : M×M → <sup>R</sup> *is called a (pseudo-)distance, if it satisfies the following properties:*


*– triangle inequality:* <sup>d</sup>(M*<sup>j</sup>* , M*<sup>l</sup>*) <sup>≤</sup> <sup>d</sup>(M*<sup>k</sup>*, M*<sup>j</sup>* ) + <sup>d</sup>(M*<sup>j</sup>* , M*<sup>l</sup>*)

**Corollary 1.** *External model diversity* d*ext <sup>i</sup>* (M*<sup>j</sup>* , M*<sup>k</sup>*) *is a (pseudo-)distance between models* <sup>M</sup>*<sup>j</sup> and* <sup>M</sup>*<sup>k</sup> for any* <sup>i</sup>*.*

During model generation, we will exclude a model <sup>M</sup>*<sup>k</sup>* if <sup>d</sup>*ext <sup>i</sup>* (M*<sup>j</sup>* , M*<sup>k</sup>*) = 0 for a previously defined model <sup>M</sup>*<sup>j</sup>* , but *it does not imply that they are isomorphic*. Thus our definition allows to avoid graph isomorphism checks between <sup>M</sup>*<sup>j</sup>* and <sup>M</sup>*<sup>k</sup>* which have high computation complexity. Note that external diversity is a dual of symmetry breaking predicates [39] used in the Alloy Analyzer where <sup>d</sup>(M*<sup>j</sup>* , M*<sup>k</sup>*) = 0 implies that <sup>M</sup>*<sup>j</sup>* and <sup>M</sup>*<sup>k</sup>* are isomorphic (and not vice versa).

**Definition 4 (Coverage of model set).** *Given a range* i *of neighborhood shapes and a set of models* MS <sup>=</sup> {M<sup>1</sup>,...,M*<sup>k</sup>*}*, the coverage of this model set is defined as* cov*<sup>i</sup>*-MS <sup>=</sup> <sup>|</sup>S*<sup>i</sup>*(M<sup>1</sup>) ∪···∪ S*<sup>i</sup>*(M*<sup>k</sup>*)|*.*

The coverage of a model set is not normalised, but its value monotonously grows for any range i by adding new models. Thus it corresponds to our expectation that adding a new test case to a test suite should increase its coverage.

*Example 4.* Let us calculate the different diversity metrics for <sup>M</sup><sup>1</sup>, <sup>M</sup><sup>2</sup> and <sup>M</sup><sup>3</sup> of Fig. 2. For range 1, they have the shapes illustrated in Fig. 4. The internal diversity of those models are d*int* <sup>1</sup> (M<sup>1</sup>)=4/6, <sup>d</sup>*int* <sup>1</sup> (M<sup>2</sup>)=4/8 and <sup>d</sup>*int* <sup>1</sup> (M<sup>3</sup>) = <sup>6</sup>/7, thus <sup>M</sup><sup>3</sup> is the most diverse model among them. As <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> has the same shape, the distance between them is d*ext* <sup>1</sup> (M<sup>1</sup>, M2) = 0. The distance between <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>3</sup> is <sup>d</sup>*ext* <sup>1</sup> (M<sup>1</sup>, M3) = 4 as <sup>M</sup><sup>1</sup> has 1 different neighbourhoods (n1), and <sup>M</sup><sup>3</sup> has 3 (n5, <sup>n</sup>6 and <sup>n</sup>7). The set coverage of <sup>M</sup><sup>1</sup>, <sup>M</sup><sup>2</sup> and <sup>M</sup><sup>3</sup> is 7 altogether, as they have 7 different neighbourhoods (n1 to n7).

#### **4 Iterative Generation of Diverse Models**

Now we aim at generating a diverse set of models MS <sup>=</sup> {M<sup>1</sup>, M<sup>2</sup>,...,M*<sup>k</sup>*} for a given metamodel MM (and potentially, a set of constraints W F). Our approach (see Fig. 5) intentionally reuses several components as building blocks obtained from existing research results aiming to derive consistent graph models. First, model generation is an iterative process where previous solutions serve as further constraints [35]. Second, it repeatedly calls a back-end graph solver [33,44] to automatically derive consistent instance models which satisfy W F.

**Fig. 5.** Generation of diverse models

As a key conceptual novelty, we enforce the structural diversity of models during the generation process using neighborhood shapes at different stages. Most importantly, if the shape <sup>S</sup>*<sup>i</sup>*(M*<sup>n</sup>*) of a new instance model <sup>M</sup>*<sup>n</sup>* obtained as a candidate solution is identical to the shape <sup>S</sup>*<sup>i</sup>*(M*<sup>j</sup>* ) for a previously derived model <sup>M</sup>*<sup>j</sup>* for a predefined (input) neighborhood range <sup>i</sup>, the solution candidate is discarded, and iterative generation continues towards a new candidate.

Internally, our tool operates over partial models [30,34] where instance models are derived along a refinement calculus [43]. The shapes of intermediate (partial) models found during model generation are continuously being computed. As such, they may help guide the search process of model generation by giving preference to refine (partial) model candidates that likely result in a different graph shape. Furthermore, this extra bookkeeping also pays off once a model candidate is found since comparing two neighborhood shapes is fast (conceptually similar to lexicographical ordering). However, our concepts could be adapted to postprocess the output of other (black-box) model generator tools.

*Example 5.* As an illustration of the iterative generation of diverse models, let us imagine that model <sup>M</sup><sup>1</sup> (in Fig. 2) is retrieved first by a model generator. Shape S<sup>2</sup>(M<sup>1</sup>) is then calculated (see Fig. 4), and since there are no other models with the same shape, <sup>M</sup><sup>1</sup> is stored as a solution. If the model generator retrieves <sup>M</sup><sup>2</sup> as the next solution candidate, it turns out that <sup>S</sup><sup>2</sup>(M<sup>2</sup>) = <sup>S</sup><sup>2</sup>(M<sup>1</sup>), thus <sup>M</sup><sup>2</sup> is excluded. Next, if model <sup>M</sup><sup>3</sup> is generated, it will be stored as a solution since S<sup>2</sup>(M<sup>3</sup>) <sup>=</sup> S<sup>2</sup>(M<sup>2</sup>). Note that we intentionally omitted the internal search procedure of the model generator to focus on the use of neighborhood shapes.

Finally, it is worth highlighting that graph shapes are conceptually different from other approaches aiming to achieve diversity. Approaches relying upon object identifiers (like [38]) may classify two graphs which are isomorphic to be different. Sampling-based approaches [17] attempt to derive non-isomorphic models on a statistical basis, but there is no formal guarantee that two models are non-isomorphic. The Alloy Analyzer [39] uses *symmetry breaking predicates* *as sufficient conditions* of isomorphism (i.e. two models are surely isomorphic). *Graph shapes provide a necessary condition* for isomorphism i.e. if a two nonisomorphic models have identical shape, one of them is discarded.

## **5 Evaluation**

In this section, we provide an empirical evaluation of our diversity metrics and model generation technique to address the following research questions:

**RQ1:** How effective is our technique in creating diverse models for testing? **RQ2:** How effective is our technique in creating diverse test suites? **RQ3:** Is there correlation between diversity metrics and mutation score?

**Target Domain.** In order to answer those questions, we executed model generation campaigns on a DSL extracted from Yakindu Statecharts (as proposed in [35]). We used the partial metamodel describing the state hierarchy and transitions of statecharts (illustrated in Fig. 1, containing 12 classes and 6 references). Additionally, we formalized 10 WF constraints regulating the transitions as graph predicates, based on the built-in validation of Yakindu.

For mutation testing, we used a constraint or negation omission operator (CO and NO) to inject an error to the original WF constraint in every possible way, which yielded 51 mutants from the original 10 constraints (but some mutants may never have matches). We checked both the original and mutated versions of the constraints for each instance model, and a model kills a mutant if there is a difference in the match set of the two constraints. The mutation score for a test suite (i.e. a set of models) is the total number of mutants killed that way.

**Compared Approaches.** Our test input models were taken from three different sources. First, we generated models with our iterative approach using a graph solver (**GS**) with different neighborhoods for ranges **r=1** to **r=3**.

Next, we generated models for the same DSL using **Alloy** [39], a well-known SAT-based relational model finder. For representing EMF metamodels we used traditional encoding techniques [8,32]. To enforce model diversity, Alloy was configured with three different setups for symmetry breaking predicates: **s=0**, **s = 10** and **s = 20** (default value). For greater values the tool produced the same set of models. We used the latest 4.2 build for Alloy with the default Sat4j [20] as back-end solver. All other configuration options were set to default.

Finally, we included 1250 manually created statechart models in our analysis (marked by **Human**). The models were created by students as solutions for similar (but not identical) statechart modeling homework assignments [43] representing real models which were *not* prepared for testing purposes.

**Measurement Setup.** To address **RQ1**–**RQ3**, we created a two-step measurement setup. In **Step I.** a set of instance models is generated with all **GS** and **Alloy** configurations. Each tool in each configuration generated a sequence of 30 instance models produced by subsequent solver calls, and each sequence is repeated 20 times (so 1800 models are generated for both **GS** and **Alloy**). In

**Fig. 6.** Mutation Scores and Diversity properties of models sets

case of **Alloy**, we prevented the deterministic run of the solver to enable statistical analysis. The model generators was to create metamodel-compliant instances compliant with the structural constraints of Subsect. 2.1 but ignoring the WF constraints. The target model size is set to 30 objects as Alloy did not scale with increasing size (the scalability and the details of the back-end solver is reported in [33]). The size of **Human** models ranges from 50 to 200 objects.

In **Step II.**, we evaluate and the mutation score for all the models (and for the entire sequence) by comparing results for the mutant and original predicates and record which mutant was killed by a model. We also calculate our diversity metrics for a neighborhood range where no more equivalence classes are produced by shapes (which turned out to be r = 7 in our case study). We calculated the internal diversity of each model, the external diversity (distance) between pairs of models in each model sequence, and the coverage of each model sequence.

**RQ1: Measurement Results and Analysis.** Figure 6a shows the distribution of the number of mutants killed by at least one model from a model sequence (left box plot), and the distribution of internal diversity (right box plot). For killing mutants, **GS** was the best performer (regardless of the **r** range): most models found 36–41 mutants out of 51. On the other hand, **Alloy** performance varied based on the value of symmetry: for **s=0**, most models found 9–15 mutants (with a large number of positive outliers that found several errors). For **s = 10**, the average is increased over 20, but the number of positive outliers simultaneously dropped. Finally, in default settings (**s = 20**) **Alloy** generated similar models, and found only a low number of mutants. We also measured the efficiency of killing mutants by **Human**, which was between **GS** and **Alloy**. None of the instance models could find more than 41 mutants, which suggests that those mutants cannot be detected at all by metamodel-compliant instances.

The right side of Fig. 6a presents the internal diversity of models measured as shape nodes/graph nodes (for fixpoint range 7). The result are similar: the diversity was high with low variance in **GS** with slight differences between ranges. In case of **Alloy**, the diversity is similarly affected by the symmetry value: **s=0** produced low average diversity, but a high number of positive outliers. With **s = 10**, the average diversity increased with decreasing number of positive outliers. And finally, with the default **s = 20** value the average diversity was low. The internal diversity of **Human** models are between **GS** and **Alloy**.

**Fig. 7.** Mutation score and set coverage for model sequences

Figure 6b illustrates the average distance between all model pairs generated in the same sequence (vertical axis) for range 7. The distribution of external diversity also shows similar characteristics as Fig. 6a: **GS** provided high diversity for all ranges (56 out of the maximum 60), while the diversity between models generated by **Alloy** varied based on the symmetry value.

*As a summary, our model generation technique consistently outperformed Alloy wrt. both the diversity metrics and mutation score for individual models.*

**RQ2: Measurement Results and Analysis.** Figure 7a shows the number of killed mutants (vertical axis) by an increasing set of models (with 1 to 30 elements; horizontal axis) generated by **GS** or **Alloy**. The diagram shows the *median* of 20 generation runs to exclude the outliers. **GS** found a large amount of mutants in the first model, and the number of killed mutants (36–37) increased to 41 by the 17th model, which after no further mutants were found. Again, our measurement showed little difference between ranges **r=1**, **2** and **3**. For **Alloy**, the result highly depends on the symmetry value: for **s=0** it found a large amount of mutants, but the value saturated early. Next, for **s = 10**, the first model found significantly less mutants, but the number increased rapidly in the for the first 5 models, but altogether, less mutants were killed than for **s=0**. Finally, the default configuration (**s = 20**) found the least number of mutants.

In Fig. 7b, the average coverage of the model sets is calculated (vertical axis) for increasing model sets (horizontal axis). The neighborhood shapes are calculated for r = 0 to 5, which after no significant difference is shown. Again, configurations of symmetry breaking predicates resulted in different characteristics for **Alloy**. However, the number of shape nodes investigated by the test set was significantly higher in case of **GS** (791 vs. 200 equivalence classes) regardless of the range, and it was monotonously increasing by adding new models.

*Altogether, both mutation score and equivalence class coverage of a model sequence was much better for our model generator approach compared to Alloy.*

**RQ3: Analysis of Results.** Figure 8 illustrates the correlation between mutation score (horizontal axis) and internal diversity (vertical axis) for all generated and human models in all configurations. Considering all models (1800 **Alloy**, 1800 **GS**, 1250 **Human**), mutation score and internal diversity shows a high correlation of 0.95 – while the correlation was low (0.12) for only **Human**.

**Fig. 8.** Model diversity and mutation score correlation

*Our initial investigation suggests that a high internal diversity will provide good mutation score, thus our metrics can potentially be good predictors in a testing context, but we cannot generalize to full statistical correlation.*

**Threats to Validity and Limitations.** We evaluated more than 4850 test inputs in our measurement, but all models were taken from a single domain of Yakindu statecharts with a dedicated set of WF constraints. However, our model generation approach did not use any special property of the metamodel or the WF constraints, thus we believe that similar results would be obtained for other domains. For mutation operations, we checked only omission of predicates, as extra constraints could easily yield infeasible predicates due to inconsistency with the metamodel, thus further reducing the number of mutants that can be killed. Finally, although we detected a strong correlation between diversity and mutation score with our test cases, this result cannot be generalized to statistical causality, because the generated models were not random samples taken from the universe of models. Thus additional investigations are needed to justify this correlation, and we only state that if a model is generated by either **GS** or **Alloy**, a higher diversity means a higher mutation score with high probability.

## **6 Related Work**

Diverse model generation plays a key role in testing model transformations code generators and complete developement environments [25]. Mutation-based approaches [1,11,22] take existing models and make random changes on them by applying mutation rules. A similar random model generator is used for experimentation purposes in [3]. Other automated techniques [7,12] generate models that only conform to the metamodel. While these techniques scale well for larger models, there is no guarantee whether the mutated models are well-formed.

There is a wide set of model generation techniques which provide certain promises for test effectiveness. White-box approaches [1,6,14,15,31,32] rely on the implementation of the transformation and dominantly use back-end logic solvers, which lack scalability when deriving graph models.

Scalability and diversity of solver-based techniques can be improved by iteratively calling the underlying solver [19,35]. In each step a partial model is extended with additional elements as a result of a solver call. Higher diversity is achieved by avoiding the same partial solutions. As a downside, generation steps need to be specified manually, and higher diversity can be achieved only if the models are decomposable into separate well-defined partitions.

Black-box approaches [8,13,15,23] can only exploit the specification of the language or the transformation, so they frequently rely upon contracts or model fragments. As a common theme, these techniques may generate a set of simple models, and while certain diversity can be achieved by using symmetry-breaking predicates, they fail to scale for larger sizes. In fact, the effective diversity of models is also questionable since corresponding safety standards prescribe much stricter test coverage criteria for software certification and tool qualification than those currently offered by existing model transformation testing approaches.

Based on the logic-based Formula solver, the approach of [17] applies stochastic random sampling of output to achieve a diverse set of generated models by taking exactly one element from each equivalence class defined by graph isomorphism, which can be too restrictive for coverage purposes. Stochastic simulation is proposed for graph transformation systems in [40], where rule application is stochastic (and not the properties of models), but fulfillment of WF constraints can only be assured by a carefully constructed rule set.

## **7 Conclusion and Future Work**

We proposed novel diversity metrics for models based on neighbourhood shapes [28], which are true generalizations of metamodel coverage and graph isomorphism used in many research papers. Moreover, we presented a model generation technique that to derive structurally diverse models by (i) calculating the shape of the previous solutions, and (ii) feeding back to an existing generator to avoid similar instances thus ensuring high diversity between the models. The proposed generator is available as an open source tool [44].

We evaluated our approach in a mutation testing scenario for Yakindu Statecharts, an industrial DSL tool. We compared the effectiveness (mutation score) and the diversity metrics of different test suites derived by our approach and an Alloy-based model generator. Our approach consistently outperformed the Alloy-based generator for both a single model and the entire test suite. Moreover, we found high (internal) diversity values normally result in high mutation score, thus highlighting the practical value of the proposed diversity metrics.

Conceptually, our approach can be adapted to an Alloy-based model generator by adding formulae obtained from previous shapes to the input specification. However, our initial investigations revealed that such an approach does not scale well with increasing model size. While Alloy has been used as a model generator for numerous testing scenarios of DSL tools and model transformations [6,8,35,36,42], our measurements strongly indicate that it is not a justified choice as (1) Alloy is very sensitive to configurations of symmetry breaking predicates and (2) the diversity and mutation score of generated models is problematic.

**Acknowledgement.** This paper is partially supported by the MTA-BME Lend¨ulet Cyber-Physical Systems Research Group, the NSERC RGPIN-04573-16 project and the UNKP-17-3-III New National Excellence Program of the Ministry of Human Capacities.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimising Spectrum Based Fault Localisation for Single Fault Programs Using Specifications**

David Landsberg(B) , Youcheng Sun , and Daniel Kroening

Department of Computer Science, University of Oxford, Oxford, UK david.landsberg@linacre.ox.ac.uk

**Abstract.** Spectrum based fault localisation determines how suspicious a line of code is with respect to being faulty as a function of a given test suite. Outstanding problems include identifying properties that the test suite should satisfy in order to improve fault localisation effectiveness subject to a given measure, and developing methods that generate these test suites efficiently.

We address these problems as follows. First, when single bug optimal measures are being used with a single-fault program, we identify a formal property that the test suite should satisfy in order to optimise fault localisation. Second, we introduce a new method which generates test data that satisfies this property. Finally, we empirically demonstrate the utility of our implementation at fault localisation on sv-comp benchmarks and the tcas program, demonstrating that test suites can be generated in almost a second with a fault identified after inspecting under 1% of the program.

**Keywords:** Software quality · Spectrum based fault localisation Debugging

## **1 Introduction**

Faulty software is estimated to cost 60 billion dollars to the US economy per year [1] and has been single-handedly responsible for major newsworthy catastrophes<sup>1</sup>. This problem is exacerbated by the fact that debugging (defined as the process of finding and rectifying a fault) is complex and time consuming – estimated to consume 50–60% of the time a programmer spends in the maintenance and development cycle [2]. Consequently, the development of effective and efficient methods for software fault localisation has the potential to greatly reduce costs, wasted programmer time and the possibility of catastrophe.

In this paper, we advance the state of the art in lightweight fault localisation by building on research in spectrum-based fault localisation (sbfl). sbfl is one

This research was supported by the Innovate UK project 113099 SECT-AIR.

<sup>1</sup> https://www.newscientist.com/gallery/software-faults/.

c The Author(s) 2018

A. Russo and A. Sch¨urr (Eds.): FASE 2018, LNCS 10802, pp. 246–263, 2018. https://doi.org/10.1007/978-3-319-89363-1\_14

of the most prominent areas of software fault localisation research, estimated to make up 35% of published work in the field to date [3], and has been demonstrated to be efficient and effective at finding faults [4–12]. The effectiveness relies on two factors, (1) the quality of the measure used to identify the lines of code that are suspected to be faulty, and (2) the quality of the test suite used. Most research in the field has been focussed on finding improved measures [4–12], but there is a growing literature on how to improve the quality of test suites [13–20]. An outstanding problem in this field is to identify the properties that test suites should satisfy to improve fault localisation.

To address this problem, we focus our attention on improving the quality of test suites for the purposes of fault localisation on single-fault programs. Programs with a single fault are of special interest, as a recent study demonstrates that 82% of faulty programs could be repaired with a "single fix" [21], and that "when software is being developed, bugs arise one-at-a-time and therefore can be considered as single-faulted scenarios", suggesting that methods optimised for use with single-fault programs would be most helpful in practice. Accordingly, the contributions of this paper are as follows.


The rest of this paper is organized as follows. In Sect. 2, we present the formal preliminaries for sbfl and our approach. In Sect. 3, we motivate and describe a property of single-fault optimality. In Sect. 4, we present an algorithm which generates data for a given faulty program, and prove that the data generated satisfies the property of single fault optimality, and in Sect. 5 discuss implementation details. In Sect. 6 we present our experimental results where we demonstrate the utility of an implementation of our algorithm on our benchmarks, and in Sect. 7 we present related work.

## **2 Preliminaries**

In this section we formally present the preliminaries for understanding our fault localisation approach. In particular, we describe probands, proband models, and sbfl.

#### **2.1 Probands**

Following the terminology in Steimann et al. [22], a *proband* is a faulty program together with its test suite, and can be used for evaluating the performance of


**Fig. 1.** minmax.c **Fig. 2.** Coverage matrix

a given fault localization method. A *faulty program* is a program that fails to always satisfy a *specification*, which is a property expressible in some formal language and describes the intended behaviour of some part of the program under test (put). When a specification fails to be satisfied for a given execution (i.e., an *error* occurs), it is assumed there exists some (incorrectly written) lines of code in the program which was the cause of the error, identified as a *fault* (aka *bug*).

*Example 1.* An example of a faulty c program is given in Fig. 1 (minmax.c, taken from Groce et al. [23]), and we shall use it as our running example throughout this paper. There are some executions of the program in which the assertion statement least <= most is violated, and thus the program fails to always satisfy the specification. The fault in this example is labelled C4, which should be an assignment to least instead of most.

A *test suite* is a collection of test cases whose result is independent of the order of their execution, where a *test case* is an execution of some part of a program. Each test case is associated with an input vector, where the n-th value of the vector is assigned to the n-th input of the given program for the purposes of a test (according to some given method of assigning values in the vector to inputs in the program). Each test suite is associated with a set of input vectors which can be used to generate the test cases. A test case *fails* (or is *failing*) if it violates a given specification, and *passes* (or is *passing*) otherwise.

*Example 2.* We give an example of a test case for the running example. The test case with associated input vector -0, 1, 2 is an execution in which input1 is assigned 0, input2 is assigned 1, and input3 is assigned 2, the statements labeled C1, C2 and C3 are executed, but C4 and C5 are not executed, and the assertion is not violated at termination, as least and most assume values of 0 and 2 respectively. Accordingly, we may associate a collection of test cases (a test suite) with a set of input vectors. For the running example the following ten input vectors are associated with a test suite of ten test cases: -1, 0, 2, -2, 0, 1, -2, 0, 2, -0, 1, 0, -0, 0, 1, -1, 1, 0, -2, 0, 0, -2, 2, 2, -1, 2, 0, and -0, 1, 2. Here, the first three input vectors result in error (and thus their associated test cases are failing), and the last seven do not (and thus their associated test cases are passing).

A *unit under test* uut is a concrete artifact in a program which is a candidate for being at fault. Many types of uuts have been defined and used in the literature, including methods [24], blocks [25,26], branches [16], and statements [27–29]. A uut is said to be *covered* by a test case just in case that test case executes the uut. For convenience, it will help to always think of uuts as being labeled C1, C2, ... etc. in the program itself (as they are in the running example). Assertion statements are not considered to be uuts, and we assume that each fault in the program has a corresponding uut.

*Example 3.* To illustrate some uuts for the running example (Fig. 1), we have chosen the units under test to be the statements labeled in comments marked C1, ... , C5. The assertion is labeled E, which is violated when an error occurs. To illustrate a proband, the faulty program minmax.c (described in Example 1), and the test suite associated with the input vectors described in Example 2, together form a proband.

#### **2.2 Proband Models**

In this section we define proband models, which are the principle formal objects used in sbfl. Informally, a proband model is a mathematical abstraction of a proband. We assume the existence of a given proband in which the uuts have already been identified for the faulty program and appropriately labeled C1,..., Cn, and assume a total of n uuts. We begin as follows.

**Definition 1.** *A set of coverage vectors, denoted by* **T***, is a set* {t1,...,t|**T**|} *in which each* t*<sup>k</sup>* ∈ **T** *is a coverage vector defined* t*<sup>k</sup>* = c*k* 1,...,c*<sup>k</sup> <sup>n</sup>*+1, k*, where*


We also call a set of coverage vectors **T** the fault localisation *data* or a *dataset*. Intuitively, each coverage vector can be thought of as a mathematical abstraction of an associated test case which describes which uuts were executed/covered in that test case. We also use the following additional notation. If the last argument of a coverage vector in **T** is the number k it is denoted t*<sup>k</sup>* where k uniquely identifies a coverage vector in **T** and the corresponding test case in the associated test suite. In general, for each <sup>t</sup>*<sup>k</sup>* <sup>∈</sup> **<sup>T</sup>**, <sup>c</sup>*<sup>k</sup> <sup>i</sup>* is a *coverage variable* and gives the value of the i-th argument in t*k*. If c*<sup>k</sup> <sup>n</sup>*+1 = 1, then t*<sup>k</sup>* is called a *failing* coverage vector, and *passing* otherwise. The set of failing coverage vectors/the event of an error is denoted E (such that the set of passing vectors is then E). Element c*<sup>k</sup> n*+1 is also denoted e*<sup>k</sup>* (as it describes whether the error occurred). For convenience, we may represent the set of coverage vectors **T** with a *coverage matrix*, where for all 0 < i <sup>n</sup> and <sup>t</sup>*<sup>k</sup>* <sup>∈</sup> **<sup>T</sup>** the cell intersecting the <sup>i</sup>-th column and <sup>k</sup>-th row is <sup>c</sup>*<sup>k</sup> i* and represents whether the i-th uut was covered in the test case corresponding to t*k*. The cell intersecting the last column and k-th row is e*<sup>k</sup>* and represents whether t*<sup>k</sup>* is a failing or passing test case. Fig. 2 is an example coverage matrix. In practice, given a program and an input vector, one can extract coverage information from an associated test case using established tools<sup>2</sup>.

*Example 4.* For the test suite given in Example 2 we can devise a set of coverage vectors **T** = {t1,...,t10} in which t<sup>1</sup> = -1, 0, 1, 1, 0, 1, 1, t<sup>2</sup> = -1, 0, 0, 1, 1, 1, 2, t<sup>3</sup> = -1, 0, 0, 1, 0, 1, 3, t<sup>4</sup> = -1, 1, 0, 0, 0, 0, 4, t<sup>5</sup> = -1, 1, 0, 0, 0, 0, 5, t<sup>6</sup> = -1, 0, 0, 0, 1, 0, 6, t<sup>7</sup> = -1, 0, 0, 1, 1, 0, 7, t<sup>8</sup> = -1, 0, 0, 0, 0, 0, 8, t<sup>9</sup> = -1, 1, 0, 0, 1, 0, 9, and t<sup>10</sup> = -1, 1, 1, 0, 0, 0, 10. Here, coverage vector t*<sup>k</sup>* is associated with the k-th input vector described in the list in Example 2. To illustrate how input and coverage vectors relate, we observe that t<sup>10</sup> is associated with a test case with input vector -0, 1, 2 which executes the statements labeled C1, C2 and C3, does not execute the statements labeled C4 and C5, and does not result in error. Consequently c<sup>10</sup> <sup>1</sup> = c<sup>10</sup> <sup>2</sup> = c<sup>10</sup> <sup>3</sup> = 1, and c<sup>10</sup> <sup>4</sup> = c<sup>10</sup> <sup>5</sup> = e<sup>10</sup> = 0, and k = 10 such that t<sup>10</sup> = -1, 1, 1, 0, 0, 0, 10 (by the definition of coverage vectors). The coverage matrix representing **T** is given in Fig. 2.

**Definition 2.** *Let* **T** *be a non-empty set of coverage vectors, then* **T***'s program model* **PM** *is defined as the sequence* -C1,...,C|**PM**|*, where for each* C*<sup>i</sup>* ∈ **PM***,* <sup>C</sup>*<sup>i</sup>* <sup>=</sup> {t*<sup>k</sup>* <sup>∈</sup> **<sup>T</sup>**|c*<sup>k</sup> <sup>i</sup>* = 1}*.*

We often use the notation **PM<sup>T</sup>** to denote the program model **PM** associated with **T**. The final component C|**PM**<sup>|</sup> is also denoted E (denoting the event of the error). Each member of a program model is called a *program component* or *event*, and if c*<sup>k</sup> <sup>i</sup>* = 1 we say C*<sup>i</sup> occurred* in t*k*, that t*<sup>k</sup> covers* C*i*, and say that C*<sup>i</sup>* is *faulty* just in case its corresponding uut is faulty. Following the definition above, each component C*<sup>i</sup>* is the set of vectors in which C*<sup>i</sup>* is covered, and obey set theoretic relationships. For instance, for all components C*i*, C*<sup>j</sup>* ∈ **PM**, we have <sup>∀</sup>t*<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*<sup>j</sup>* . c*<sup>k</sup> <sup>i</sup>* = 1 just in case C*<sup>j</sup>* ⊆ C*i*. In general, we assume that E contains at least one coverage vector and each coverage vector covers at least one component. Members of E and E are called failing/passing vectors, respectively.

*Example 5.* We use the running example to illustrate a program model. For the set of coverage vectors **T** = {t1,...,t10}, we may define a program model

<sup>2</sup> For C programs Gcov can be used, available at http://www.gcovr.com.

**PM** = -C1, C2, C3, C4, C5, E, where C<sup>1</sup> = {t1,...,t10}, C<sup>2</sup> = {t4, t9, t10}, C<sup>3</sup> = {t1, t5, t10}, C<sup>4</sup> = {t1, t2, t3, t7}, C<sup>5</sup> = {t2, t6, t7, t9}, E = {t1, t2, t3}. Here, we may think of C1,...,C<sup>5</sup> as events which occur just in case a corresponding uut (lines of code labeled C1, ... ,C5 respectively) is executed, and E as an event which occurs just in case the assertion least <= most is violated. C<sup>4</sup> is identified as the faulty component.

**Definition 3.** *For a given proband we define a* proband model -**PM**, **T***, consisting of the given faulty program's program model* **PM***, and an associated test suite's set of coverage vectors* **T***.*

Finally, we extend our setup to distinguish between samples and populations. The *population test suite* for a given program is a test suite consisting of all possible test cases for the program, a *sample test suite* is a test suite consisting of some (but not necessarily all) possible test cases for the program. All test suites are sample test suites drawn from a given population. Let -**PM**, **T** be a given proband model for a given faulty program and sample test suite, we denote the *population vectors*, corresponding to the population test suite of the given faulty program, as **T**<sup>∗</sup> (and E<sup>∗</sup> and E<sup>∗</sup> as the population failing and passing vectors in **T**<sup>∗</sup> respectively). The *population program model* associated with the population test suite is denoted **PM**<sup>∗</sup> (aka **PM**<sup>∗</sup> **<sup>T</sup>**<sup>∗</sup> ). -**PM**<sup>∗</sup>, **T**<sup>∗</sup> is called the *population proband model*. Finally, we extend the use of asterisks to make clear that the asterisked variable is associated with a given population. Accordingly, each component in the population program model is also superscripted with a \* to denote that it is a member of **PM**<sup>∗</sup> (e.g. C<sup>∗</sup> <sup>1</sup> ). Each vector in the population set of vectors **T**<sup>∗</sup> (e.g., t ∗ <sup>1</sup>), and each coverage variable in each vector t ∗ *<sup>k</sup>* ∈ **T**<sup>∗</sup> (e.g., c*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> ).

It is assumed that for a given sample proband model -**PM**, **T** and its population proband model -**PM**<sup>∗</sup>, **T**<sup>∗</sup>, we have **T** ⊆ **T**∗. Intuitively, this is because a sample test suite is drawn from the population. In addition, for each <sup>i</sup> <sup>∈</sup> <sup>N</sup> if C*<sup>i</sup>* ∈ **PM** and C<sup>∗</sup> *<sup>i</sup>* ∈ **PM**∗, then C*<sup>i</sup>* ⊆ C<sup>∗</sup> *<sup>i</sup>* . Intuitively, this is because if the i-th uut is executed by a test case in the sample then it is executed by that test case in the population.

#### **2.3 Spectrum Based Fault Localisation**

We first define what a program spectrum is, as it serves as the principle formal object used in spectrum based fault localization (sbfl).

**Definition 4.** *For each proband model* -*PM*, **T***, and each component* C*<sup>i</sup>* ∈ *PM, a component's program spectrum is a vector* -|C*i*∩E|, |C*i*∩E|, |C*i*∩E|, |C*i*∩E|*.*

Informally, |C*<sup>i</sup>* ∩ E| is the number of failing coverage vectors in **T** that cover C*i*, |C*<sup>i</sup>* ∩ E| is the number of failing coverage vectors in **T** that do not cover C*i*, |C*<sup>i</sup>* ∩ E| is the number of passing coverage vectors in **T** that cover C*i*, and |C*<sup>i</sup>* ∩ E| is the number of passing coverage vectors in **T** that do not cover <sup>C</sup>*i*. <sup>|</sup>C*<sup>i</sup>* <sup>∩</sup> <sup>E</sup>|, <sup>|</sup>C*<sup>i</sup>* <sup>∩</sup> <sup>E</sup>|, <sup>|</sup>C*<sup>i</sup>* <sup>∩</sup> <sup>E</sup><sup>|</sup> and <sup>|</sup>C*<sup>i</sup>* <sup>∩</sup> <sup>E</sup><sup>|</sup> are often denoted <sup>a</sup>*<sup>i</sup> ef* , a*<sup>i</sup> nf* , a*<sup>i</sup> ep*, and a*i np* respectively in the literature [4,7–12].

*Example 6.* For the proband model of the running example -**PM**, **T** (where **PM** = -C1,...,C5, E and **T** is represented by the coverage matrix in Fig. 2), the spectra for C1, ... C5, and E are -3, 0, 7, 0, -0, 3, 3, 4, -1, 2, 2, 5, -3, 0, 1, 6, -1, 2, 3, 4, and -3, 0, 0, 7 respectively.

Following Naish et al. [7], we define a suspiciousness measure as follows.

**Definition 5.** *A suspiciousness measure* w *is a function with signature* w : **PM** <sup>→</sup> <sup>R</sup>*, and maps each* <sup>C</sup>*<sup>i</sup>* <sup>∈</sup> **PM** *to a real number as a function of* <sup>C</sup>*i's program spectrum* -|C*<sup>i</sup>* ∩ E|, |C*<sup>i</sup>* ∩ E|, |C*<sup>i</sup>* ∩ E|, |C*<sup>i</sup>* ∩ E|*, where this number is called the component's degree of suspiciousness.*

The higher/lower the degree of suspiciousness the more/less suspicious C*<sup>i</sup>* is assumed to be with respect to being a fault. A property of some sbfl measures is *single-fault optimality* [7,30]. Using our notation we can express this property as follows:

**Definition 6.** *A suspiciousness measure* w *is* single-fault optimal *if it satisfies the following. For every program model* **PM** *and every* C*<sup>i</sup>* ∈ **PM***:*

*1. If* E ⊆ C*<sup>i</sup> and* E ⊆ C*<sup>j</sup> , then* w(C*<sup>j</sup>* ) > w(C*i*) *and 2. if* E ⊆ C*i,* E ⊆ C*<sup>j</sup> ,* |C*<sup>i</sup>* ∩ E| = k *and* |C*<sup>j</sup>* ∩ E| < k*, then* w(C*<sup>j</sup>* ) > w(C*i*)*.*

Under the assumption that there is a single fault in the program, Naish et al. argue that a measure must have this property to be optimal [7]. Informally, the first condition demands that uuts covered by all failing test cases are more suspicious than anything else. The rationale here is that if there is only one faulty uut in the program, then it must be executed by all failing test cases (otherwise there would be some failing test case which executes no fault – which is impossible given it is assumed that all errors are caused by the execution of some faulty uut) [7,30]. The second demands that of twouuts covered by all failing test cases, the one which is executed by fewer passing test cases is more suspicious.

An example of a single fault optimal measure is the Naish-I measure w(C*i*) = a*i ef* <sup>−</sup> *<sup>a</sup><sup>i</sup> ep ai ep*+*a<sup>i</sup> np*+1 [31]. A framework that optimises any given sbfl measure to being single fault optimal was first given by Naish [31]. For any suspiciousness measure w scaled from 0 to 1, we can construct the *single fault optimised* version for w (written Opt*w*) as follows (here, we use the equivalent formulation of Landsberg et al. [4]): Opt*w*(C*i*) = a*<sup>i</sup> np* + 2 if a*<sup>i</sup> ef* = |E|, and w(C*i*) otherwise.

We now describe the established sbfl algorithm [4,7–12]. The method produces a list of program component indices ordered by suspiciousness, as a function of set of coverage vectors **T** (taken from a proband model -**PM**, **T**) and suspiciousness measure w. As the algorithm is simple, we informally describe the algorithm in three stages, as follows. First, the program spectrum for each program component is constructed as a function of **T**. Second, the indices of program components are ordered in a *suspiciousness list* according to decreasing order of suspiciousness. Third, the suspiciousness list is returned to the user, who will inspect each uut corresponding to each index in the suspiciousness list in decreasing order of suspiciousness until a fault is found. We assume that in the case of ties of suspiciousness, the uut that comes earlier in the code is investigated first, and assume effectiveness of a sbfl measure on a proband is measured by the number of non-faulty uuts a user has to investigate before a fault is found.

*Example 7.* We illustrate an instance of sbfl using our running minmax.c example of Fig. 1, and the Naish-I measure as an example suspiciousness measure. First, the program spectra (given in Example 6) are constructed as a function of the given coverage vectors (represented by the coverage matrix of Fig. 2). Second, the suspiciousness of each program component is computed (here, the suspiciousness of the five components are 2.125, −0.375, 0.75, 2.875, 0.625 respectively), and the indices of components are ordered according to decreasing order of suspiciousness. Thus we get the list -4, 1, 3, 5, 2. Finally, the list is returned to the user, and the uuts in the program are inspected according to this list in descending order of suspiciousness until a fault is found. In our running example, C4 (the fault) is investigated first.

## **3 A Property of Single-Fault Optimal Data**

In this section, we identify a new property for the optimality of a given dataset **T** for use in fault localisation. Throughout we make two assumptions: Firstly that a single bug optimal measure w is being used and secondly that there is a single bug in a given faulty program (henceforth our *two assumptions*). Let -**PM**, **T** be a given sample proband model, then we have the following:

**Definition 7.** A Property of Single Fault Optimal Data*. If* **T** *is single bug optimal, then* ∀C*<sup>i</sup>* ∈ **PMT**. E ⊆ C*<sup>i</sup>* → E<sup>∗</sup> ⊆ C<sup>∗</sup> *i .*

If this condition holds, then we say the dataset **T** (and its associated test suite) satisfies this property of single fault optimality. Informally, the condition demands that if a uut is covered by all failing test cases in the sample test suite then it is covered by all failing test cases in the population. If our two assumptions hold, we argue it is a desirable that a test suite satisfies this property. This is because the fault is assumed to be covered by all failing test cases in the population (similar to the rationale of Naish et al. [7]), and as uuts executed by all failing test cases in the sample are investigated first when a single fault optimal measure is being used, it is desirable that uuts not covered by all failing test cases in the population are less suspicious in order to guarantee the fault is found earlier. An additional desirable feature of knowing one's data satisfies this property, is that we do not have to add any more failing test cases to a test suite, given it is then impossible to improve fault localization effectiveness by adding more failing test cases under our two assumptions.


**Data**: *E*, *E*<sup>∗</sup> (pre-condition: *E* ⊆ *E*<sup>∗</sup> ∧ *E* = ∅) **1 repeat <sup>2</sup>** *<sup>T</sup>* <sup>←</sup> *choose*({*<sup>t</sup>* ∗ *<sup>k</sup>* <sup>∈</sup> *<sup>E</sup>*<sup>∗</sup>|∃*<sup>i</sup>* <sup>∈</sup> <sup>N</sup>*.*∀*t<sup>j</sup>* <sup>∈</sup> *E.c<sup>j</sup> <sup>i</sup>* = 1 <sup>∧</sup> *<sup>c</sup><sup>k</sup>*<sup>∗</sup> *<sup>i</sup>* = 0}); **<sup>3</sup>** *E* ← *E* ∪ *T*; **<sup>4</sup> until** *T* = ∅; **5 return** *E*

## **4 Algorithm**

In this section we present an algorithm which outputs single fault optimal data for a given faulty program. We assume several preconditions for our algorithm.


The algorithm is formally presented as Algorithm 1. We assume that an associated sample test suite will also be available as a by-product of the algorithm in addition to producing the data E. The intuition behind the algorithm is that failing vectors are iteratively accumulated in a set E one by one, where the next failing vector added does not cover some component which is covered by all vectors already in E (the algorithm terminates if no such vector exists). The resulting set is observed to be single-fault optimal. To illustrate the algorithm we give the example below. We then give a proof of partial correctness.

*Example 8.* We assume some population set of failing coverage vectors E∗, which we may identify with the set {t ∗ 1, t<sup>∗</sup> 2, t<sup>∗</sup> <sup>3</sup>} = {-1, 0, 1, 1, 0, 1, 1,-1, 0, 0, 1, 1, 1, 2, -1, 0, 0, 1, 0, 1, 3} described in the coverage matrix of Fig. 2. In reality, the population set of failing coverage vectors for this faulty program is much larger than this, but this will suffice for our example. The algorithm proceeds as follows. First, we assume E is a non-empty subset of E∗, and thus may assume E = {-1, 0, 1, 1, 0, 1, 1}. Now, to evaluate step 2, we first evaluate the set {t ∗ *<sup>k</sup>* <sup>∈</sup> <sup>E</sup><sup>∗</sup> | ∃<sup>i</sup> <sup>∈</sup> <sup>N</sup>.∀t*<sup>j</sup>* <sup>∈</sup> E.c*<sup>j</sup> <sup>i</sup>* = 1 <sup>∧</sup> <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> *<sup>i</sup>* = 0}. Intuitively, this is the set of failing vectors in the population which do not cover some component which is covered by all vectors in E. We may find a member of this set as follows. First, we must evaluate the condition for when E<sup>∗</sup> = {t ∗ 1, t<sup>∗</sup> 2, t<sup>∗</sup> <sup>3</sup>}. Given <sup>c</sup><sup>1</sup> <sup>3</sup> = 1 holds of t1, and t<sup>1</sup> is the only member of E, and given c<sup>2</sup><sup>∗</sup> <sup>3</sup> = 0, we have the conclusion that t ∗ <sup>2</sup> is a member of the set. Thus, for our example we may assume that *choose* returns t ∗ <sup>2</sup> from this set such that T = {t ∗ <sup>2</sup>}. So at step 3 the new version of E is E = {-1, 0, 1, 1, 0, 1, 1,-1, 0, 0, 1, 1, 1, 2}. Consequently, on the next iteration of the loop the set condition will be unsatisfiable – this is because there is no index to a component <sup>i</sup> such that both <sup>∀</sup>t*<sup>j</sup>* <sup>∈</sup> E.c*<sup>j</sup> <sup>i</sup>* = 1 holds (i.e., E ⊆ C*i*), and also c*k*<sup>∗</sup> *<sup>i</sup>* = 0 holds for some vector t ∗ *<sup>k</sup>* in the population (i.e., not E<sup>∗</sup> ⊆ C*i*). Thus, *choose* will return the empty set, and the algorithm will terminate returning the dataset E to the user to be used in sbfl. Using the Naish-I measure with this dataset, we have the result that C1 and C4 are associated with the largest suspicious score of 2.0. Thus, with single-fault optimal data alone we can find a fault C4 reasonably effectively in our running example.

#### **Proposition 1.** *All datasets returned by Algorithm 1 are single-fault optimal.*

*Proof.* We show partial correctness as follows. Let -**PM**<sup>∗</sup>, **T**<sup>∗</sup> be a given population proband model, where E<sup>∗</sup> ⊆ **T**<sup>∗</sup> is the population set of failing vectors, and let E be returned by the algorithm. We must show that for all C*<sup>i</sup>* ∈ **PM***E*, E ⊆ C*<sup>i</sup>* → E<sup>∗</sup> ⊆ C<sup>∗</sup> *<sup>i</sup>* (by def. of single fault optimality). We prove this by contradiction. Assume there is some C*<sup>i</sup>* ∈ **PM***<sup>E</sup>* (without loss of generality we may assume i = 1), such that E ⊆ C<sup>1</sup> but not E<sup>∗</sup> ⊆ C<sup>∗</sup> <sup>1</sup> . Given we assume E has been returned by the algorithm, we may assume T = ∅ (step 4), and thus *choose* returned ∅ at step 2 (by def. of *choose*). Accordingly, there is no t ∗ *<sup>k</sup>* ∈ E<sup>∗</sup> where ((∀t*<sup>k</sup>* ∈ E)c *j* <sup>1</sup> = 1) <sup>∧</sup> <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> = 0 (by the set condition at step 2). Thus, (∀t ∗ *<sup>k</sup>* ∈ E∗) ((∀t*<sup>j</sup>* ∈ E)c *j* <sup>1</sup> = 1) <sup>→</sup> <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> = 1. Now, ((∀t*<sup>j</sup>* ∈ E) c *j* <sup>1</sup> = 1) just in case E ⊆ C<sup>1</sup> (by def. of program models). So, (∀t ∗ *<sup>k</sup>* <sup>∈</sup> <sup>E</sup>∗), if <sup>E</sup> <sup>⊆</sup> <sup>C</sup><sup>1</sup> then <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> = 1 (by substitution of equivalents). Equivalently, if E ⊆ C<sup>1</sup> then (∀t ∗ *<sup>k</sup>* <sup>∈</sup> <sup>E</sup>∗) <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> = 1. Now, in general it holds that ((∀t ∗ *<sup>k</sup>* <sup>∈</sup> <sup>E</sup>∗) <sup>c</sup>*<sup>k</sup>*<sup>∗</sup> <sup>1</sup> = 1) just in case E<sup>∗</sup> ⊆ C<sup>∗</sup> <sup>1</sup> (by def. of program models). Thus E ⊆ C<sup>1</sup> → E<sup>∗</sup> ⊆ C<sup>∗</sup> <sup>1</sup> (by substitution of equivalents). This contradicts the initial assumption.

Finally, we informally observe that the maximum size of the E returned is the number of uuts. In this case E is input to the algorithm with a failing vector that covers all components, and *choose* always returns a failing vector that covers 1 fewer uuts than the failing vector covering the fewest uuts already in E (noting that we assume at least one component will always be covered). The minimum is one. In this case E is input to the algorithm with a failing vector which covers some components and the post-condition is already fulfilled. In general, E can potentially be much smaller than E∗.

#### **5 Implementation**

We now discuss our implementation of the algorithm. In practice, we can leverage model checkers to compute members of E<sup>∗</sup> (the population set of failing vectors) on the fly, where computing E<sup>∗</sup> as a pre-condition would usually be intractable. This can be done by appeal to a SMT solving subroutine, which we describe as follows. Given a formal model of some code F*code* , a formal specification φ, set of Booleans which are true just in case a corresponding uut is executed in a given execution {C1,..., Cn}, and a set E ⊆ E∗, we can use a SMT solver to return a satisfying assignment by calling SMT(F*code* ∧ ¬φ ∧ - (∀*tk*∈*E*)*c<sup>k</sup> <sup>i</sup>* =1 Ci = 0), and then extracting a coverage vector from that assignment. A subroutine which returns this coverage vector (or the empty set if one does not exist) can act as a substitute for the *choose* subroutine in Algorithm 1, and the generation of a static object E<sup>∗</sup> is no longer required as an input to the algorithm. Our implementation of this is called *sfo* (single fault optimal data generation tool).

We now discuss extensions of *sfo*. It is known that adding passing executions help in sbfl [4,5,7–12], thus to develop a more effective fault localisation procedure we developed a second implementation *sfo<sup>p</sup>* (*sfo* with passing traces) that runs *sfo* and then adds passing test cases. To do this, after running *sfo* we call a SMT solver 20 times to find up to 20 new passing execution, where on each call if the vector found has new coverage properties (does not cover all the same uuts as some passing vector already computed) it is added to a set of passing vectors.

Our implementations of *sfo* and *sfo<sup>p</sup>* are integrated into a branch of the model checker cbmc [32]. Our branch of the tool is available for download at the URL given in the footnote<sup>3</sup>. Our implementations, along with generating fault localisation data, rank uuts by degree of suspiciousness according to the Naish-I measure and report this fault localisation data to the user.

## **6 Experimentation**

In this section we provide details of evaluation results for the use of *sfo* and *sfo<sup>p</sup>* in fault localisation. The purpose of the experiment is to demonstrate that implementations of Algorithm 1 can be used to facilitate efficient and effective fault localisation in practice on small programs (≤2.5kloc). We think generation of fault localisation information in a few seconds (≤2) is sufficient to demonstrate practical efficiency, and ranking the fault in the top handful of the most suspicious lines of code (≤5) on average is sufficient to demonstrate practical effectiveness. In the remainder of this section we present our experimental setup (where we describe our scoring system and benchmarks), and our results.

#### **6.1 Setup**

For the purposes of comparison, we tested the fault localisation potential of *sfo* and *sfo<sup>p</sup>* against a method named *1f* , which performes sbfl when only a single failing test case was generated by cbmc (and thus uuts covered by the test case were equally suspicious). We used the following scoring method to evaluate the effectiveness of each of the methods for each benchmark. We envisage an engineer who is inspecting each loc in descending order of suspiciousness using a given strategy (inspecting lines that appear earlier in the code first in the case of ties). We rank alternative techniques by the number of non-faulty loc that are investigated until the engineer finds a fault. Finally, we report the average of these scores for the benchmarks to give us an overall measure of fault localisation effectiveness.

<sup>3</sup> https://github.com/theyoucheng/cbmc.

We now discuss the benchmarks used in our experiments. In order to perform an unbiased experiment to test our techniques on, we imposed that our benchmarks needed to satisfy the following three properties (aside from being a C program which cbmc could be used on):


Unfortunately, benchmarks satisfying these conditions are rare. In practice, benchmarks exist in verification research that satisfy either the second or third criterion, but rarely both. For instance, the available sir benchmarks satisfy the third criterion, but not the second<sup>4</sup>. The software verification competition (sv-comp) benchmarks satisfy the second criterion, but almost never satisfy the third<sup>5</sup>. Furthermore, it is often difficult to obtain benchmarks from authors even when usable benchmarks do in fact exist. Finally, we have been unable to find an instance of a C program that was not artificially developed for the purposes of testing.

The benchmarks are described in Table 1, where we give the benchmark name, the number of faults in the program, and lines of code (loc). The modified versions of tcas were made available by Alex Groce via personal correspondence and were used with the Explain tool in [33] <sup>6</sup>. The remaining benchmarks were identified as usable by manual investigation and testing in the repositories of sv-comp 2013 and 2017. We have made our benchmarks available for download directly from the link on footnote 4. Faults in sv-comp programs were identified by comparing them to an associated fault-free version (in tcas the fault was already identified). A series of continuous lines of code that differed from the fault free version (usually one line, and rarely up to 5 loc for larger programs) constituted one fault. loc were counted using the cloc utility.

We give further details about our application of cbmc in this experiment. For all our benchmarks, we used the smallest unwinding number that enables the bounded model checker to find a counterexample. These counterexamples were sliced, which usually results in a large improvement in fault localisation. For details about unwindings and slicing see the cbmc documentation [34]. In each benchmark each executable statement (variable initialisations, assignments, or condition statements) was determined as a uut.

<sup>4</sup> http://sir.unl.edu/portal/index.php.

<sup>5</sup> Benchmarks can be accessed at https://sv-comp.sosy-lab.org/2018/.

<sup>6</sup> For our experiment we activated assertion statement P5a and fault 32c.

#### **6.2 Results and Discussion**

In this section we discuss our experimental results. In Table 1, columns *1f* /*sfop*/*sfo* give the scores for when the respective method is used. Column t gives the runtime for cbmc and *sfo<sup>p</sup>* respectively (we ignore the runtime for *sfo* due to negligible difference). |E| and |E| give the number of failing and passing test cases generated by *sfop*. The AVG row gives averages column values. We are primarily interested in comparing the scores of *sfo<sup>p</sup>* and *1f* .


**Table 1.** Experimental results

We now discuss the results of the three techniques *1f* , *sfo* and *sfop*. On average, *1f* located a fault after investigating 17.23 lines of code (4.09% of the program on average). The results here are perhaps better than expected. We observed that the single failing test case consistently returned good fault localisation potential given the use of slicing by the technique.

We now discuss *sfo*. On average, *sfo* located a fault after investigating 16 lines of code (3.8% of the program on average). Thus, the improvement over *1f* is very small. When only one failing test case was available for *sfo* (i.e. |E| = 1) we emphasise that the SMT solver could not find any other failing traces which covered different parts of the program. In such cases, *sfo* performed the same as *1f* (as expected). However, when there was more than one failing test case available (i.e. |E| > 1), *sfo* always made a small improvement. Accordingly, for benchmarks 1, 2, 3, 5, 9, and 12 the improvements in terms fewer loc examined are 2, 6, 3, 1, 2, and 3, respectively. An improvement in benchmarks where *sfo* generated more than one test case is to be expected, given there was always a fault covered by all failing test cases in each program (even in programs with multiple faults), thus taking advantage of the property of single fault optimal data. Finally, we conjecture that on programs with more failing test cases available in the population, and on longer faulty programs, that this improvement will be larger.

We now discuss *sfop*. On average, *sfo<sup>p</sup>* located a fault after investigating 4.08 loc (0.97% of each program on average). Thus, the improvement over the other techniques is quite large (four times as effective as *1f* ). Moreover, this effectiveness came at very little expense to runtime – *sfo<sup>p</sup>* had an average runtime of 1.06 s, which is comparable to the runtime of *1f* of 0.78 s. This is despite the fact that *sfo<sup>p</sup>* generated over 7 executions on average. We consequently conclude that implementations of Algorithm 1 can be used to facilitate efficient and effective fault localisation in practice on small programs.

## **7 Related Work**

The techniques discussed in this paper improve the quality of data usable for sbfl. We divide the research in this field into the following areas; many other methods can be potentially combined with our technique.

*Test Suite Expansion.* One approach to improving test suites is to add more test cases which satisfy a given criterion. A prominent criterion is that the test suite has sufficient program coverage, where studies suggest that test suites with high coverage improve fault localisation [15–17,20]. Other ways to improve test suites for sbfl are as follows. Li et al. generate test suites for sbfl, considering failing to passing test case ratio to be more important than number [35]. Zhang et al. consider cloning failed test cases to improve sbfl [13]. Perez et al. develop a metric for diagnosing whether a test suite is of sufficient quality for sbfl to take place [14]. Li et al. consider weighing different test cases differently [36]. Aside from coverage criteria, methods have been studied which generate test cases with a minimal distance from a given failed test case [18]. Baudry et al. use a bacteriological approach in order to generate test suites that simultaneously facilitate both testing and fault localisation [19]. Concolic execution methods have been developed to add test cases to a test suite based on their similarity to an initial failing run [20].

Prominent approaches which leverage model checkers for fault localisation are as follows. Groce [33] uses integer linear programming to find a passing test case most similar to a failing one and then compare the difference. Schupman and Bierre [37] generate short counterexamples for use in fault localisation, where a short counterexample will usually mean fewer uuts for the user to inspect. Griesmayer [38] and Birch et al. [39] use model checkers to find failing executions and then look for whether a given number of changes to values of variables can be made to make the counterexample disappear. Gopinath et al. [40] compute minimal unsatisfiable cores in a given failing test case, where statements in the core will be given a higher suspiciousness level in the spectra ranking. Additionally, when generating a new test, they generate an input whose test case is most similar to the initial run in terms of its coverage of the statements. Fey et al. [41] use SAT solvers to localise faults on hardware with LTL specifications. In general, experimental scale is limited to a small number of programs in these studies, and we think our experimental component provides an improvement in terms of experimental scale (13 programs).

*Test Suite Reduction.* An alternative approach to expanding a test suite is to use reduction methods. Recently, many approaches have demonstrated that it is not necessary for all test cases in a test suite to be used. Rather, one can select a handful of test cases in order to minimise the number of test cases required for fault localisation [42,43]. Most approaches are based on a strategy of eliminating redundant test cases relative to some coverage criterion. The effectiveness of applying various coverage criteria in test suite reduction is traditionally based on empirical comparison of two metrics: one which measures the size of the reduction, and the other which measures how much fault detection is preserved.

*Slicing.* A prominent approach to improving the quality of test suites involves the process of slicing test cases. Here, sbfl proceeds as usual except the program and/or the test cases composing the test suite are sliced (with irrelevant lines of code/parts of the execution removed). For example, Alves et al. [44] combine Tarantula along with dynamic slices, Ju et al. [45] use sbfl in combination with both dynamic and execution slices. Syntactic dynamic slicing is built-in in all our tested approaches by appeal to the functionalities of cbmc.

To our knowledge, no previous methods generate data which exhibit our property of single fault optimality.

## **8 Conclusion**

In this paper, we have presented a method to generate single fault optimal data for use with sbfl. Experimental results on our implementation *sfop*, which integrates single fault optimal data along with passing test cases, demonstrate that small optimized fault localisation data can be generated efficiently in practice (1.06 s on average), and that subsequent fault localization can be performed effectively using this data (investigating 4.06 loc until a fault is found). We envisage that implementations of the algorithm can be used in two different scenarios. In the first, the test suite generated can be used in standalone fault localisation, providing a small and low cost test suite useful for repeating iterations of simultaneous testing and fault localisation during program development. In the second, the data generated can be added to any pre-existing data associated with a test suite, which may be useful at the final testing stage where we may wish to optimise single fault localisation.

Future work involves finding larger benchmarks to use our implementation on and developing further properties, and methods for use with programs with multiple faults. We would also like to combine our technique with existing test suite generation algorithms in order to experiment how much test suites can be additionally improved for the purposes of fault localization.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **TCM: Test Case Mutation to Improve Crash Detection in Android**

Yavuz Koroglu(B) and Alper Sen

Department of Computer Engineering, Bogazici University, Istanbul, Turkey *{*yavuz.koroglu,alper.sen*}*@boun.edu.tr

**Abstract.** GUI testing of mobile applications gradually became a very important topic in the last decade with the growing mobile application market. We propose Test Case Mutation (TCM) which mutates existing test cases to produce richer test cases. These mutated test cases detect crashes that are not previously detected by existing test cases. TCM differs from the well-known Mutation Testing (MT) where mutations are inserted in the source code of an Application Under Test (AUT) to measure the quality of test cases. Whereas in TCM, we modify existing test cases and obtain new ones to increase the number of detected crashes. Android applications take the largest portion of the mobile application market. Hence, we evaluate TCM on Android by replaying mutated test cases of randomly selected 100 AUTs from F-Droid benchmarks. We show that TCM is effective at detecting new crashes in a given time budget.

#### **1 Introduction**

As of April 2016, there are over 2.6 billion smartphone users worldwide and this number is expected to go up [1]. There is an increasing focus on mobile application testing starting from the last decade in top testing conferences and journals [2]. Android applications have the largest share in the mobile application market, where 82.8% of all mobile applications are designed for Android [1]. Therefore, we focus on Android GUI Testing in this paper.

The main idea of TCM is to mutate existing test cases to produce richer test cases in order to increase the number of detected crashes. We first identify typical crash patterns that exist in Android applications. Then, we develop mutation operators based on these crash patterns. Typically mutation operators are applied to the source code of applications. However, in our work we apply them to test cases.

Typical crash patterns in Android are Unhandled Exceptions, External Errors, Resource Unavailability, Semantic Errors, and Network-Based Crashes [3]. We describe one case study for each crash pattern. We define six novel mutation operators (Loop-Stressing, Pause-Resume, Change Text, Toggle Contextual State, Remove Delays, and Faster Swipe) and relate them to these five crash patterns.

**Fig. 1.** TCM overview

We implement TCM on top of AndroFrame [4], a fully automated Android GUI testing tool. We give an overview of TCM in Fig. 1. First, we generate a test suite for the Application Under Test (AUT) using AndroFrame. AndroFrame obtains an AUT Model which is represented as an Extended Labeled Transition System (ELTS). We then minimize the Generated Test Suite using the AUT Model in order to reduce test execution costs (Test Suite Minimization). We apply Test Case Mutation (TCM) on the Minimized Test Suite and obtain a Mutated Test Suite. We use AndroFrame to execute the Mutated Test Suite and collect Test Results.

We state our contributions as follows:


## **2 Background**

In this section, we first describe the basics of the Android GUI to facilitate the understanding of our paper.

Android GUI is based on *activities*, *events*, and *crashes*. An *activity* is a container for a set of GUI components. These GUI components can be seen on the Android screen. Each GUI component has properties that describe boundaries of the component in pixels (x1, y1, x2, y2) and how the user can interact with the component (*enabled*, *clickable*, *longclickable*, *scrollable*, *password*). Each GUI component also has a *type* property from which we can understand whether the component accepts text input. A GUI component accepts text input if its *password* property is *true* or its *type* is *EditText*.


**Table 1.** List of GUI actions

The Android system and the user can interact with GUI components using *events*. We divide events in two categories, *system events* and *GUI events (actions)*. We show the list of GUI actions that we use in Table 1, which covers more actions then are typically used in the literature. Note that GUI actions in Table 1 are possible inputs from the user whereas system events are not. We group actions into three categories; non-contextual, contextual, and special. Non-contextual actions correspond to actions that are triggered by user gestures. *Click* and *longclick* take two parameters, x and y coordinates to click on. *Text* takes three parameters, x and y for coordinates and string to describe what to write. *Swipe* takes five parameters. The first four parameters describe the starting and the ending coordinates. The fifth parameter is used to adjust the speed of *swipe*. *Menu* and *back* actions have no parameters. These actions just click to the menu and back buttons of the mobile device, respectively. Contextual actions correspond to the user changing the contextual state of the AUT. Contextual state is the concatenation of the global attributes of the mobile device (internet connectivity, bluetooth, location, planemode, sleeping). The *connectivity* action adjusts the internet connectivity of the mobile device (adjusts wifi or mobile data according to which is available for the mobile device). *Bluetooth*, *location*, and *planemode* are straightforward. The *doze* action taps the power button of the mobile device and puts the device to sleep or wakes it. We use the *doze* action to pause and resume the AUT. Our only special action is *reinitialize*, which reinstalls and starts an AUT. System events are system generated events, e.g. *battery level*, *receiving SMS*, *clock/timer*.

We report a *crash* whenever a fatal exception is recorded in Android logs similar to previous work [3,5]. Crashes often result with the AUT terminating with or without any warning. Some crashes do not visually affect the execution, but the AUT halts as a result.

We use the *Extended Labeled Transition System* (ELTS) [6] as a model for the AUT. Formally, an ELTS M = (V,v0, Z, ω, λ) is a 5-tuple, where


We define a GUI state, or simply a *state* v to be the concatenation of the (1) package name (a name representing the AUT), (2) activity name, (3) contextual state, and (4) GUI components.

Each state v has a set of enabled actions λ(v), extracted from its set of GUI components. We say that a GUI action, or simply an *action* z ∈ λ(v) *is enabled* at state v iff we can deduce that z interacts with at least one GUI component in v.

A *transition* is a 3-tuple, (start-state, end-state, action), shortly denoted by (vs, ve, z). We extend the standard transition and define a *delayed transition* as a 4-tuple, (start-state, end-state, action, delay in seconds), shortly denoted by (vs, ve, z, d). We do this to later change the duration of transitions via mutation. We define an execution trace, or simply a *trace* t, as a sequence of delayed transitions. An example trace can be given as t = (v1, v2, z1, d1),(v2, v3, z2, d2),...,(vn, v<sup>n</sup>+1, zn, dn) where n is the *length* of the trace.

We say that a trace t is a *test case* if the first state of the trace is the initial state v<sup>0</sup> (the GUI state when the AUT is started). A *test suite* T S is a set of test cases. AndroFrame generates these test suites. Then, TCM applies minimization and mutation to generate new test suites.

### **3 Android Crash Patterns and Mutation Operators**

In this section, we first describe typical crash patterns for Android applications based on related work in the literature [3]. We give a list of the crash patterns in Table 2 and describe them below.

#### **3.1 Android Crash Patterns**

**C1. Unhandled Exceptions.** An AUT may crash due to misuse of libraries or GUI components, e.g. overuse of a third party library (stressing) may cause the third party library to crash.

**C2. External Errors.** An AUT may communicate with external applications. This communication requires either permissions or valid Inter Process Communication (IPC) for Android. There are three types of IPC in Android; intents, binders, and shared memory. Intents are used to send messages between applications. These messages are called bundles. Binders are used to invoke methods of other applications. An AUT may crash with an external error due to (1) the AUT attempts to communicate with another application without sufficient permissions, (2) the AUT receives an intent with an invalid bundle from another application, (3) the AUT sends an intent with an invalid bundle and fails to receive an answer due to a crash in the other application, (4) another application uses a binder with illegal arguments, (5) the AUT uses a binder on another application with illegal arguments and fails to receive the return value due to a crash in the other application, or (6) shared memory of the AUT is freed by another application.


**Table 2.** Relating crash patterns and mutation operators

**C3. Resource Unavailability.** In Android, an AUT may be paused at any time by executing an *onPause()* method. This method is very brief and does not necessarily afford enough time to perform save operations. The *onPause()* method may terminate prematurely if its operations take too much time, causing a resource unavailability problem that may crash the AUT when it is resumed. Another problem is that an AUT may use one or more system resources such as memory and sensor handlers (e.g. orientation) during execution. When the AUT is paused, it releases system resources. The AUT may crash if it is unable to allocate these resources back when it is resumed.

**C4. Semantic Errors.** An AUT may crash if it fails to handle certain inputs given by the user. For example, AUT may crash instead of generating a warning if some textbox is left empty, or contains an unexpected text.

**C5. Network-Based Crashes.** An AUT may connect with remote servers or peers via *bluetooth* or *wifi*. The AUT may crash and terminate if it does not handle the cases where the server is unreachable, the connectivity is disabled, or the communicated data causes an error in the AUT.

#### **3.2 Mutation Operators**

We now define the set of Android mutation operators that we developed. We denote these operators by Δ. We describe these mutation operators, then relate them to the crash patterns above, and summarize these relations in Table 2.

**Definition 1.** *A* mutation operator δ *is a function which takes a test case* t *and returns a new test case* t *. We denote a mutation as* t = δ(t)*.*

**M1. Loop-Stressing (***δ***LS).** t = δLS(t) reexecutes all looping actions of a test case t multiple times with d second delay. An action z<sup>i</sup> of a delayed transition t<sup>i</sup> = (vi, v<sup>i</sup>+1, zi, di) in t is *looping* iff v<sup>i</sup>+1 = vi. Let tj...k denote the subsequence of actions between jth and kth indices of test case t, inclusively. Then,

$$\delta\_{\rm LS}(t) = t\_1^{ls} \cdot t\_2^{ls} \cdot \dots \cdot t\_n^{ls} \text{ where } t\_i^{ls} = \left\{ \underbrace{t\_i' \cdot t\_i' \cdot \dots \cdot t\_i'}\_{m \text{ times}} \, \, v\_i = v\_{i+1} \right\} \tag{1}$$

Here n is the length of test case t and t <sup>i</sup> = (vi, v<sup>i</sup>+1, zi, d ). We pick d = 1 to avoid double-click, which may be programmed as a separate action than single click. We pick m = 9. We have two motivations for choosing m = 9. First, in our case studies, we did not encounter a crash when m < 9. Second, although we detect the same crash when m > 9, we want to keep m as small as possible to keep test cases small. Loop-stressing may lead to an unhandled exception (C1) due to stressing the third party libraries by invoking them repeatedly. Loop-stressing may also lead to an external error (C2) if it stresses another application until it crashes.

**M2. Pause-Resume (***δ***PR).** t = δPR(t) adds two consecutive *doze* actions between all transitions of the test case t. Let t pr <sup>i</sup> = (vi, *doze off*, 2)·(vi, *doze on*, 2). Then,

$$\delta\_{\rm PR}(t) = t\_1^{pr} \cdot t\_1 \cdot t\_2^{pr} \cdot t\_2 \cdot \dots \cdot t\_n^{pr} \cdot t\_n \tag{2}$$

Pause-resume may trigger a crash due to resource unavailability (C3).

**M3. Change Text (***δ***CT).** We assume that existing test cases contain wellbehaving text inputs to explore the AUT as much as possible. To increase the number of detected crashes, we modify the contents of the texts.

t = δCT(t) first picks one random abnormal text manipulation operation and applies it to a random *textentry* action of the existing test case t. Abnormal text manipulation operations can be *emptytext*, *dottext*, and *longtext* where *emptytext* deletes the text, *dottext* enters a singe dot character, and *longtext* enters a random string of length >200.

Let zct <sup>i</sup> denote a random abnormal text manipulation action where z<sup>i</sup> is a text action and dct <sup>i</sup> denotes the new delay required to completely execute zct i . We define t = δCT(t) on test cases as follows:

$$\delta\_{\rm CT}(t) = \begin{cases} \displaystyle t & \nexists z\_i = \textit{textentry} \\ t\_{1...i-1} \cdot t\_i^{ct} \cdot t\_{i+1...n} & \text{otherwise} \end{cases} \tag{3}$$

where n is the length of t and t ct <sup>i</sup> = (vi, vi+1, zct <sup>i</sup> , dct <sup>i</sup> ). An AUT may crash because the corresponding *onTextChange()* method of the AUT throws an unhandled exception (C1). The AUT may also crash if the content of the text is an unexpected kind of input, which causes a semantic error later (C3).

**M4. Toggle Contextual State (***δ***TCS).** Existing test suites typically lack contextual actions where the condition of the contextual state is crucial to generate the crash. Therefore, we introduce contextual state toggling with t = δTCS(t) which is defined as follows.

$$\delta\_{\rm TCR}(t) = t\_1 \cdot t\_1^{tcs} \cdot t\_2 \cdot t\_2^{tcs} \cdot \dots \cdot t\_n \cdot t\_n^{tcs} \tag{4}$$

where n is the length of test case t and t tcs <sup>i</sup> is a contextual action transition (v<sup>i</sup>+1, v<sup>i</sup>+1, ztcs, d ). ztcs corresponds to a random contextual toggle action. We pick d = 10 s for each contextual action since Android may take a long time before it stabilizes after the change of contextual state. Toggling the contextual states of the AUT may result in an external error (C2), or a network-based crash if the connection failures are not handled correctly (C5).

**M5. Remove Delays (***δRD* **).** t = δRD(t) takes a test case t and sets all of its delays to 0. When reproduced, the events of t will be in the same order with t, but sent to the AUT at the earliest possible time.

$$\delta\_{\rm RD}(t) = (v\_1, v\_2, z\_1, 0) \cdot (v\_2, v\_3, z\_2, 0) \cdot \dots \cdot (v\_n, v\_{n+1}, z\_n, 0) \tag{5}$$

If the AUT is communicating with another application, removing delays may cause the requests to crash the other application. If this case is not handled in the AUT, the AUT crashes due to external errors (C2). If the AUT's background process is affected by the GUI actions, removing delays may cause the background process to crash due to resource unavailability (C3). If the GUI actions trigger network requests, having no delays may cause a network-based crash (C5).

**M6. Faster Swipe (***δFS* **).** t = δFS(t) increases the speed of all swipe actions of a test case <sup>t</sup>. Let <sup>z</sup>f s <sup>i</sup> denote a faster version of zi, where z<sup>i</sup> is a *swipe* action. Then, we define δFS on test cases with at least one *swipe* action as follows.

$$\delta\_{\rm FS}(t) = t\_1^{fs} \cdot t\_2^{fs} \cdot \dots \cdot t\_n^{fs} \tag{6}$$

where n is the length of test case t and

$$t\_i^{fs} = \begin{cases} (v\_i, v\_{i+1}, z\_i, d\_i) & z\_i \text{ is NOT a } swipe\\ (v\_i, v\_{i+1}, z\_i^{fs}, d\_i) & \text{otherwise} \end{cases}$$

If the information presented by the AUT is downloaded from a network or another application, swiping too fast may cause a network-based crash (C3) due to the network being unable to provide the necessary data or an external error (C2). If the AUT is a game, swiping too fast may cause the AUT to throw an unhandled exception (C1).


**Require:** T S : A test suite for the AUT M : AUT Model **Ensure:** T S- : Minimized Test Suite 1: T S- ← ∅ 2: **for** <sup>t</sup> ∈ {<sup>t</sup> : <sup>t</sup> <sup>∈</sup> T S <sup>∧</sup> <sup>t</sup> does not crash} **do** - Iterate over non-crashing test cases 3: **if** covM(T S- ∪ {t}) <sup>&</sup>gt; covω(T S- ) **then** - Take only the test cases that increase coverage 4: t- <sup>←</sup> argmin<sup>i</sup> <sup>t</sup>1...i **s.t.** covM(T S- ∪ {t1...i}) = covM(T S- ∪ {t}) - Shorten the test case 5: T S- <sup>←</sup> T S- ∪ {<sup>t</sup> - } - Add the shortened test case to the Minimized Test Suite 6: **end if** 7: **end for**

#### **Algorithm 2.** Test Case Mutation (TCM) Algorithm

**Require:** T S : A Test Suite X : Timeout of the New Test Suite Δ: Set of Mutation Operators **Ensure:** T S- : New Test Suite 1: T S- ← {} 2: <sup>x</sup> <sup>←</sup> <sup>0</sup> 3: **repeat** 4: <sup>t</sup> <sup>←</sup> random <sup>t</sup> <sup>∈</sup> TS - Pick a random test case 5: <sup>δ</sup> <sup>←</sup> random <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup> **s.t.** <sup>t</sup> <sup>=</sup> <sup>δ</sup>(t) - Pick a mutation operator that changes the test case 6: t- <sup>←</sup> <sup>δ</sup>(t) - Apply the mutation operator to the test case 7: T S- <sup>←</sup> T S- ∪ {<sup>t</sup> - } - Add the mutated test case to the New Test Suite 8: <sup>x</sup> <sup>←</sup> <sup>x</sup> <sup>+</sup> - (vs,ve,z,d)∈t d - Calculate the total delay 9: **until** x>X -Repeat until the total delay is above the given timeout

#### **4 Test Suite Minimization and Test Case Mutation**

Before mutating the existing test cases in a test suite T S, we first minimize T S. In order to minimize a test suite T S, we first define an edge coverage function covω(T S) over the AUT model M as follows:

$$\text{cov}\_M(TS) = \frac{\#\text{ of unique transitions covered in the AUT Model } M \text{ by } TS}{\#\text{ of all transitions in the AUT Model } M}$$

(7)

We present our Test Suite Minimization approach in Algorithm 1. We iterate over all non-crashing test cases of the original test suite T S in line 2. We use non-crashing test cases in Algorithm 1 because our goal is to generate crashes from non-crashing via mutation. We check if the test case t increases the edge coverage in line 3. If t increases the edge coverage, we shorten the test case t from its end by deleting transitions that are not contributing to the edge coverage and add the shortened test case t to the minimized test suite.

We present our Test Case Mutation approach in Algorithm 2. We pick a random test case t from given T S in line 4. Then, we pick a random mutation operator δ that changes t in line 5. We mutate t with δ and add the mutated test case t to T S until the total delay of T S exceeds the given timeout X.

**Fig. 2.** Motivating example (mutations are denoted as bold)

## **5 Motivating Example**

Figures 2a and b show a test suite and an AUT model, respectively. We generate this test suite and the AUT model by executing AndroFrame for one minute on an example AUT. We execute AndroFrame for just one minute, because that is enough to generate test cases for this example. We limit the maximum number of transitions per test case to five to keep the test cases small in this motivating example for ease of presentation. The test suite has four test cases; A, B, C, and D. Each row of test cases describes a delayed transition. The *click* action has coordinates, but we abstract this information for the sake of simplicity.

Among the four test cases reported by AndroFrame, we take only the noncrashing test cases, A and D. In our example, we include D since it increases the edge coverage and we exclude A since all of A's transitions are also D's transitions, i.e. A is subsumed by D. Then, we attempt to minimize test case D without reducing the edge coverage. In our example, we don't remove any transitions from D because all transitions in D contribute to the edge coverage. We then generate mutated test cases by randomly applying mutation operators to D one by one until we reach one minute timeout. Figure 2c shows an example mutated test suite. Test case Mutated 1 takes D and exercises the back button for multiple times to stress the loop at state v1. Test case Mutated 2 clicks the hardware power button twice (doze off, doze on) between each transition. This operation pauses and resumes the AUT in our test devices. We then execute all mutated test cases on the AUT. Our example AUT in fact crashes when the loop on v<sup>1</sup> is reexecuted more than eight times and also crashes when the AUT is paused in state v2. When executed, our mutated test cases reveal these crashes both at their ninth transition, doubling the number of detected crashes.

**Fig. 3.** Number of total distinct crashes detected across time

#### **6 Evaluation**

In this section, we evaluate TCM via experiments and case studies. We show that, through experiments, we improve crash detection. We then show, with case studies, how we detect crash patterns.

#### **6.1 Experiments**

We selected 100 AUTs (excluding the case studies described later) from F-Droid benchmarks [7] for experiments. To evaluate the improvement in crash detection, we first execute AndroFrame, Sapienz, PUMA, Monkey, and A<sup>3</sup>E for 20 min each on these applications with no mutations enabled on test cases. Then we execute TCM with 10 min for AndroFrame to generate test cases and 10 min to mutate the generated test cases and replay them to detect more crashes. AndroFrame requires the maximum length of a test case as a parameter. We used its default parameter, 80 transitions maximum per test case.

Figure 3 shows the number of total distinct crashes detected by each tool across time. Whenever a crash occurs, the Android system logs the resulting stack trace. We say that two crashes are distinct if stack traces of these crashes are different.

Our results show that AndroFrame detects more crashes than any other tool from very early on. TCM detects the same number of crashes with AndroFrame for the first 10 min (600 s). During that time, AndroFrame detects 15 crashes. In the last 10 min, TCM detects 14 more crashes whereas AndroFrame detects only 3 more crashes. As a result TCM detects 29 crashes in total whereas AndroFrame detects 18 crashes in total. As a last note, all other tools including AndroFrame seem to stabilize after 20 min whereas TCM finds many crashes near timeout. This shows us that TCM may find even more crashes when timeout is longer.


**Fig. 4.** An example crash found only by TCM

Overall, TCM finds 14 more crashes than AndroFrame and 17 more crashes than Sapienz, the best among other tools.

We also investigate how much each mutation operator contributes to the number of detected crashes. Our observations reveal that M1 (δLS) detects one crash, M2 (δPR) detects four crashes, M3 (δCT) detects two crashes, M4 (δTCS) detects two crashes, M5 (δRD) detects four crashes, and M6 (δFS) detects one crash. These crashes add up to 14, which is the number of crashes detected by TCM in the last 10 min. This result shows that while all mutation operators contribute to the crash detection, M2 and M5 have the largest contribution.

We present and explain one crash that is found only by TCM in Fig. 4. Figure 4a shows an instance where AndroFrame generates and executes a test case t on the Yahtzee application. Note that t does not lead to a crash, but only a warning message. Figure 4b shows the instance where TCM mutates t and executes the mutated test case t . When t is executed, the application crashes and terminates. We note that this crash was not found by any other tool. Mao et al. [8] also report that Sapienz and Dynodroid did not find any crashes in this application.

#### **6.2 Case Studies**

In this section, we verify that the aforementioned crash patterns exist via case studies, one case study for each crash pattern. These studies verify that all of our crash patterns are observable in Android platform. These case studies help us develop and fine-tune our mutation operators.

**Case Study 1.** Figure 5a shows a crashing activity of the *SoundBoard* application included in F-Droid benchmarks. Basically, the *coin* and *tube* buttons activate a third party library, AudioFlinger, to produce sound when tapped. AndroFrame generates test cases which tap these buttons. These test cases produce no crashes. Then, we mutate the test cases with TCM. When we apply *loop-stressing* (M1) on any of these buttons, AudioFlinger crashes due to overuse. AudioFlinger produces a fatal exception (C1) in Android logs. This crash does not cause an abnormal termination, but it causes the AUT to stop functioning (the AUT stops producing sounds until it is restarted).

(d) Semantic Error (C4) Example (e) Network-Based Crash (C5) Example

C1: Unhandled Exception (C1) Example

C2: External Error (C2) Example

**Fig. 5.** Case studies 1–5

**Case Study 2.** Figure 5b shows a crashing activity of the *a2dpVol* application included in F-Droid benchmarks, where AndroFrame fails to generate crashing test cases. We mutate these test cases with TCM. When we activate bluetooth (M4), tapping *find devices* button produces a crash in the external *android.bluetooth.IBluetooth* application due to a missing method (C2) and the AUT terminates.

**Case Study 3.** Figure 5c shows a crashing activity of the *importcontacts* application included in F-Droid benchmarks. The AUT handles the case that it fails to import contacts, as we show in the leftmost screen. Pausing the AUT at this screen causes the background process to abort and free its allocated memory (we show the related screen in the middle). However, the paused activity is not destroyed. If the user tries to resume this activity, the AUT crashes as we show in the rightmost screen, since the memory was freed before. TCM applies a pauseresume mutation (M2) and triggers this resource unavailability crash (C3).

**Case Study 4.** Figure 5d shows a crashing activity of the *aCal* application included in F-Droid benchmarks. AndroFrame generates test cases with wellbehaving text inputs. These test cases produce no crashes. Then, we mutate the test cases with TCM. When we apply *change text* (M3) on the last text box and then tap the *configure* button, this produces a semantic error (C4). The AUT crashes and terminates.

**Case Study 5.** Figure 5e shows a crashing activity of the *Mirrored* application included in F-Droid benchmarks. When *wifi* is turned off, the AUT goes into offline mode and does not crash as shown in the leftmost screen. When we toggle *wifi* (M4), the AUT retrieves several articles as shown in the middle, but crashes when it fails to retrieve article contents due to a network-based crash (C5) as shown in the rightmost screen.

## **7 Discussion**

Although TCM is conceptually applicable to different GUI platforms, e.g. iOS or a desktop computer, there are three key challenges. First, our crash patterns are not guaranteed to exist or be observable in different platforms. Second, our mutation operators may not be applicable to those platforms, e.g. swipe may not be available as a gesture. Third, either an AUT model may be impossible to obtain or a replayable test case may be impossible to generate in those platforms. When all these challenges are addressed, we believe TCM should be applicable to not just Android, but other platforms as well.

TCM mutates test cases after they are generated. We could apply mutated inputs immediately during test generation. However, this requires us to alter the test generation process which may not be possible if a third party test generation tool is used. Our approach is conceptually applicable to any test generation tool without altering the test generation tool.

We use an edge coverage criterion to minimize a given test suite. Because of this the original test suite covers potentially more paths than the minimized test suite and therefore explores the same edge in different contexts. Without minimization, test cases in the test suite are too many and too large to generate enough mutations to observe crashes in given timeout. Therefore, we argue that by minimizing the test suite we improve the crash detection performance of TCM at the cost of the test suite's completeness in terms of a higher coverage criterion than edge coverage.

Although TCM detects crashes, it does not detect all possible bug patterns. Qin et al. [9] thoroughly classifies all bugs in Android. According to this classification, there are two types of bugs in Android, Bohrbugs and Mandelbugs. A Bohrbug is a bug whose reachability and propagation are simple. A Mandelbug is a bug whose reachability and propagation are complicated. Qin et al. further categorize Mandelbugs as Aging Related Bugs (ARB) and Non-Aging Related Mandelbugs (NAM). Qin et al. also define five subtypes for NAM and six subtypes for ARB. TCM detects only the first two subtypes of NAM, TIM and SEQ. TIM and SEQ are the only kinds of bugs which are triggered by user inputs. If a bug is TIM, the error is caused by the timing of inputs. If a bug is SEQ, the error is caused by the sequencing of inputs.

We note two key points on the crash patterns of TCM. First, testing tools we compare TCM with only detect SEQ bugs. TCM introduces the detection of TIM bugs in addition to SEQ bugs. Second, Azim et al. [3] further divides SEQ and TIM bugs into six crash patterns. We base our crash patterns on these crash patterns. We present both external errors and permission violations as one crash pattern since permission violations occur as attempts to communicate with external applications with insufficient permissions. As a result, we obtain five crash patterns.

We did not encounter any crash patterns other than the five crash patterns that we describe in Sect. 3. However, it is still possible to observe other crash patterns with our mutation operators due to emerging crash patterns caused by the fragmentation and fast development of the Android platform.

Our mutation operators insert multiple transitions to the test case, creating an issue of locating the fault inducing transition. Given that the mutated test case detects a crash, fault localization can be achieved using a variant of *delta debugging* [10].

We use regular expressions on the Android logs to detect crashes. In the experiment, we only detected *FATAL EXCEPTION* labeled errors as done in previous work [3,5], ignoring Application Not Responding (ANR) and other errors described by Carino and Andrews [11]. Although we believe that TCM would still detect more crashes than pure AndroFrame (fatal exception is the most common crash in Android), we will improve our crash detection procedure as a future work to give more accurate results.

We randomly selected 100 Android applications from the well-known F-Droid benchmarks also used by other testing tools [7]. We show that these applications have similar characteristics with the rest of F-Droid applications in our previous work.

### **8 Related Work**

Test Case Mutation (TCM) differs from the well-known Mutation Testing (MT) [12] where mutations are inserted in the source code of an AUT to measure the quality of existing test cases. Whereas in TCM, we update existing test cases to increase the number of detected crashes. Oliveria et al. [13] are the first to suggest using Mutation Testing (MT) for GUIs. Deng et al. [14] define several source code level mutation operators for Android applications to measure the quality of existing test suites.

The concept of Test Case Mutation is not new. In Android GUI Testing, Sapienz [8] and EvoDroid [15] are Android testing tools that use evolutionary algorithms, and therefore mutation operators. Sapienz shuffles the orders of the events, whereas EvoDroid mutates the test case in two ways: (1) EvoDroid transforms text inputs and (2) EvoDroid either injects, swaps, or removes events. TCM mutates not only text inputs, but also introduces 5 more novel mutation operators. Furthermore, Sapienz and EvoDroid use their mutation operators for both exploration and crash detection whereas we specialize TCM's mutation operators for crash detection only. In Standard GUI Testing, MuCRASH [16] uses test case mutation via defining special mutation operators on test cases, where the operators are defined at the source code level. They use TCM for crash reproduction, whereas ours is the first work that uses TCM to discover new crashes. Directed Test Suite Augmentation (DTSA) introduced by Xu et al. in 2010 [17] also mutates existing test cases but for the goal of achieving a target branch coverage.

We implement TCM on AndroFrame [4]. AndroFrame is one of the state-ofthe-art Android GUI Testing tools. AndroFrame finds more crashes than other available alternatives in the literature such as A<sup>3</sup>E and Sapienz. These tools generate replayable test cases as well. They provide the necessary utilities to replay their generated test cases. We can mutate these test cases but most of our mutations won't be applicable for two reasons. First, A<sup>3</sup>E and Sapienz do not learn a model from which we can extract looping actions. Second, A<sup>3</sup>E and Sapienz do not support contextual state toggling. Implementing all of our mutations on top of these tools is possible, but requires a significant amount of engineering effort. Therefore we implement TCM on top of AndroFrame.

Other black-box testing tools in the literature include A<sup>3</sup>E [18], SwiftHand [6], PUMA [19], DynoDroid [20], Sapienz [8], EvoDroid [15], CrashScope [5] and MobiGUITAR [21]. From these applications, only EvoDroid, CrashScope, and MobiGUITAR are publicly unavailable.

Monkey is a simple random generation-based fuzz tester for Android. Monkey detects the largest number of crashes among other black-box testing tools. Generation-based fuzz testing is a popular approach in Android GUI Testing, which basically generates random or unexpected inputs. Fuzzing could be completely random as in Monkey, or more intelligent by detecting relevant events as in Dynodroid [20]. TCM can be viewed as a mutation-based fuzz testing tool, where we modify existing test cases rather than generating test cases from scratch. TCM can be implemented on top of Monkey or DynoDroid to improve crash detection of these tools.

Baek and Bae [22] define a comparison criterion for Android GUI states. AndroFrame uses the maximum comparison level described in this work, which makes our models as fine-grained as possible for black-box testing.

## **9 Conclusion**

In this study, we developed a novel test case mutation technique that allows us to increase detection of crashes in Android applications. We defined six mutation operators for GUI test cases and relate them to commonly occurring crash patterns in Android applications. We obtained test cases through a state-of-theart Android GUI testing tool, called AndroFrame. We showed with several case studies that our mutation operators are able to uncover new crashes.

As a future work, we plan to study a broader set of GUI actions, such as *rotation* and *doubleclick*. We will improve our mutation algorithm by sampling mutation operators from a probability distribution based on crash rates rather than a uniform distribution. We will find the most optimal timings for executing the test generator and TCM, rather than dividing the available time into two equal halves. We will further investigate Android crash patterns.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# CRETE: A Versatile Binary-Level Concolic Testing Framework

Bo Chen1(B) , Christopher Havlicek<sup>2</sup> , Zhenkun Yang<sup>2</sup> , Kai Cong<sup>2</sup> , Raghudeep Kannavara<sup>2</sup> , and Fei Xie<sup>1</sup>

> <sup>1</sup> Portland State University, Portland, OR 97201, USA {chenbo,xie}@pdx.edu <sup>2</sup> Intel Corporation, Hillsboro, OR 97124, USA {christopher.havlicek,zhenkun.yang,kai.cong, raghudeep.kannavara}@intel.com

Abstract. In this paper, we present crete, a versatile binary-level concolic testing framework, which features an open and highly extensible architecture allowing easy integration of concrete execution frontends and symbolic execution engine backends. crete's extensibility is rooted in its modular design where concrete and symbolic execution is loosely coupled only through standardized execution traces and test cases. The standardized execution traces are llvm-based, self-contained, and composable, providing succinct and sufficient information for symbolic execution engines to reproduce the concrete executions. We have implemented crete with klee as the symbolic execution engine and multiple concrete execution frontends such as qemu and 8051 Emulator. We have evaluated the effectiveness of crete on GNU Coreutils programs and TianoCore utility programs for UEFI BIOS. The evaluation of Coreutils programs shows that crete achieved comparable code coverage as klee directly analyzing the source code of Coreutils and generally outperformed angr. The evaluation of TianoCore utility programs found numerous exploitable bugs that were previously unreported.

## 1 Introduction

Symbolic execution [1] has become an increasingly important technique for automated software analysis, e.g., generating test cases, finding bugs, and detecting security vulnerabilities [2–11]. There have been many recent approaches to symbolic execution [12–22]. Generally speaking, these approaches can be classified into two categories: online symbolic execution (e.g., BitBlaze [4], klee [5], and s<sup>2</sup>e [6]), and concolic execution (a.k.a., offline symbolic execution, e.g., CUTE [2], DART [3], and SAGE [7]). Online symbolic execution closely couples Symbolic Execution Engines (see) with the System Under Test (sut) and explore all possible execution paths of sut online at once. On the other hand, concolic execution decouples see from the sut through traces, which concretely runs a single execution path of a sut and then symbolically executes it.

Both online and offline symbolic execution are facing new challenges, as computer software is experiencing an explosive growth, both in complexities and diversities, ushered in by the proliferation of cloud computing, mobile computing, and Internet of Things. Two major challenges are: (1) the sut involves many types of software for different hardware platforms and (2) the sut involves many components distributed on different machines and as a whole the sut cannot fit in any see. In this paper, we focus on how to extend concolic execution to satisfy the needs for analyzing emerging software systems. There are two major observations behind our efforts on extending concolic execution:


We present crete, a versatile binary-level concolic testing framework, which features an open and highly extensible architecture allowing easy integration of concrete execution frontends and symbolic execution backends. crete's extensibility is rooted in its modular design where concrete and symbolic execution is loosely coupled only through standardized execution traces and test cases. The standardized execution traces are llvm-based, self-contained, and composable, providing succinct and sufficient information for see to reproduce the concrete executions. The crete framework is composed of:


We have implemented the crete framework on top of qemu [23] and klee, particularly the tracing plugin for qemu, the replayer for klee, and the manager that coordinates qemu and klee to exchange runtime traces and test cases and manages the policies for prioritizing runtime traces and test cases. To validate crete extensibility, we have also implemented a tracing plugin for the 8051 emulator [24]. The trace-based architecture of crete has enabled us to integrate such tracing frontends seamlessly. To demonstrate its effectiveness and capability, we evaluated crete on GNU Coreutils programs and TianoCore utility programs for UEFI BIOS, and compared with klee and angr, which are two state-of-art open-source symbolic executors for automated program analysis at source-level and binary-level.

The crete framework makes several key contributions:

– Versatile concolic testing. crete provides an open and highly extensible architecture allowing easy integration of different concrete and symbolic execution environments, which communicate with each other only by exchanging standardized traces and test cases. This significantly improves applicability and flexibility of concolic execution to emerging platforms and is amenable to leveraging new advancements in symbolic execution.


## 2 Related Work

DART [3] and CUTE [2] are both early representative work on concolic testing. They operate on the source code level. crete further extends concolic testing and targets close-source binary programs. SAGE [7] is a Microsoft internal concolic testing tool that particularly targets at X86 binaries on Windows. crete is platform agnostic: as long as a trace from concrete execution can be converted into the llvm-based trace format, it can be analyzed to generate test cases.

klee [5] is a source-level symbolic executor built on the llvm infrastructure [25] and is capable of generating high-coverage test cases for C programs. crete adopts klee as its see, and extends it to perform concolic execution on standardized binary-level traces. s<sup>2</sup><sup>e</sup> [6] provides a framework for developing tools for analyzing close-source software programs. It augments a Virtual Machine (vm) with a see and path analyzers. It features a tight coupling of concrete and symbolic execution. crete takes a loosely coupled approach to the interaction of concrete and symbolic execution. crete captures complete execution traces of the sut online and conducts whole trace symbolic analysis off-line.

BitBlaze [4] is an early representative work on binary analysis for computer security. It and its follow-up work Mayhem [8] and MergePoint [12] focus on optimizing the close coupling of concrete and symbolic execution to improve the effectiveness in detecting exploitable software bugs. crete has a different focus on providing an open architecture for binary-level concolic testing that enables flexible integration of various concrete and symbolic execution environments.

angr [14] is an extensible Python framework for binary analysis using VEX [26] as an intermediate representation (IR). It implemented a number of existing analysis techniques and enabled the comparison of different techniques in a single platform. angr needs to load a sut in its own virtual environment for analysis, so it has to model the real execution environment for the sut, like system calls and common library functions. crete, however, performs in-vivo binary analysis, by analyzing binary-level trace captured from unmodified execution environment of a sut. Also, angr needs to maintain execution states for all paths being explored at once, while crete reduces memory usage dramatically by analyzing a sut path by path and separates symbolic execution from tracing.

Our work is also related to fuzz testing [27]. A popular representative tool for fuzzing is AFL [28]. Fuzzing is fast and quite effective for bug detection; however, it can easily get stuck when a specific input, like magic number, is required to pass a check and explore new paths of a program. Concolic testing guides the generation of test cases by solving constraints from the source code or binary execution traces and is quite effective in generating complicated inputs. Therefore, fuzzing and concolic testing are complementary software testing techniques.

## 3 Overview

During the design of the crete framework for binary-level concolic testing, we have identified the following design goals:


To achieve the goals above, we adopts an online/offline approach to concolic testing in the design of the crete framework:


This online tracing and offline test generation process is iterative: it repeats until all generated test cases are issued or time bounds are reached. We extend this process to satisfy our design goals as follows.

Fig. 1. crete architecture


## 4 Design

In this section, we present the design of crete with a vm as the concrete execution environment. The reason for selecting a vm is that it allows complete access to the whole system for tracing runtime execution states and is generally accessible as mature open-source projects.

# 4.1 crete Architecture

As shown in Fig. 1, crete has four key components: crete Runner, a tiny helper program executing in the guest OS of the vm, which parses the configuration file and launches the target binary program (tbp) with the configuration and test cases; crete Tracer, a comprehensive tracing plug-in in the vm, which captures binary-level traces from the concrete execution of the tbp in the vm; crete Replayer, an extension of the see, which enables the see to perform concolic execution on the captured traces and to generate test cases; crete Manager, a coordinator that integrates the vm and see, which manages runtime traces captured and test cases generated, coordinates the concrete and symbolic execution in the vm and the see, and iteratively explores the tbp.

crete takes a tbp and a configuration file as inputs, and outputs generated test cases along with a report of detected bugs. The manual effort and learning curve to utilize crete are minimal. It makes virtually no difference for users to setup the testing environment for the tbp in a crete instrumented vm than a vanilla vm. The configuration file is an interface for users to configure parameters on testing a tbp, especially specifying the number and size of symbolic commandline inputs and symbolic files for test case generation.

#### 4.2 Standardized Runtime Trace

To enable the modular and plug-and-play design of crete, a standardized binary-level runtime trace format is needed. A trace in this format must capture sufficient information from the concrete execution, so the trace can be faithfully replayed within the see. In order to integrate a concrete execution environment to the crete framework, only a plug-in for the environment needs to be developed, so that the concrete execution trace can be stored in the standard file format. Similarly, in order to integrate a see into crete, the engine only needs to be adapted to consume trace files in that format.

We define the standardized runtime trace format based on the llvm assembly language [31]. The reasons for selecting the llvm instruction sets are: (1) it has become a de-facto standard for compiler design and program analysis [25,32]; (2) there have been many program analysis tools based on llvm assembly language [5,33–35]. A standardized binary-level runtime trace is packed as a selfcontained llvm module that is directly consumable by a llvm interpreter. It is composed of (1) a set of assembly-level basic blocks in the format of llvm functions (2) a set of hardware states in the format of llvm global variables (3) a set of crete-defined helper functions in llvm assembly (4) a main function in llvm assembly. The set of assembly-level basic blocks is captured from a concrete execution of a tbp. It is normally translated from another format (such as qemu-ir) into llvm assembly, and each basic block is packed as a llvm function. The set of hardware states are runtime states along the execution of the tbp. It consist of CPU states, memory states and maybe states of other hardware components, which are packed as llvm global variables. The set of helper functions are provided by crete to correlate captured hardware states with captured basic blocks, and open interface to see. The main function represents the concrete execution path of the tbp. It contains a sequence of calls to captured basic blocks (llvm functions), and calls to crete-defined helper functions with appropriate hardware states (llvm global variables).

An example of a standardized runtime trace of crete is listed in Fig. 2. The first column of this figure is a complete execution path of a program with given concrete inputs. It is in the format of assembly-level pseudo-code. Assuming the basic blocks BB\_1 and BB\_3 are of interest and are captured by crete Tracer, while other basic blocks are not (see Sect. 4.3 for details). As shown in the second and third column of the figure, hardware states are captured in two categories, initial state and side-effects from basic blocks not being captured. As shown in the forth column of the figure, captured basic blocks are packed as llvm functions, and captured hardware states are packed as llvm global variables in


Fig. 2. Example of standardized runtime trace

the standardized trace. A main function is also added making the trace a selfcontained llvm module. The main function first invokes crete helper functions to initialize hardware states, then it calls into the first basic block llvm function. Before it calls into the second basic block llvm function, the main function invokes crete helper functions to update hardware states. For example, before calling asm\_BB\_3, it calls function sync\_state to update register r1 and memory location 0x5678, which are the side effects brought by BB\_2.

#### 4.3 Selective Binary-Level Tracing

A major part of a standardized trace is assembly-level basic blocks which are essentially binary-level instruction sequences representing a concrete execution of a tbp. It is challenging and also unnecessary to capture the complete execution of a tbp. First, software binaries can be very complex. If we capture the complete execution, the trace file can be prohibitively large and difficult for the see to consume and analyze. Second, as the tbp is executing, it is very common to invoke many runtime libraries (such as libc) of no interest to testers. Therefore, an automated way of selecting the code of interest is needed.

crete utilizes Dynamic Taint Analysis (DTA) [36] to achieve selective tracing. The DTA algorithm is a part of crete Tracer. It tracks the propagation of tainted values, normally specified by users, during the execution of a program. It works on binary-level and in byte-wise granularity. By utilizing the DTA algorithm, crete Tracer only captures basic blocks that operate on tainted values, while only capturing side-effects from other basic blocks. For the example trace in Fig. 2, if the tainted value is from user's input to the program and is stored at memory location 0x1234, DTA captures basic block BB\_1 and BB\_3, because both of them operate on tainted values, while the other two basic blocks do not touch tainted values, and are not captured by DTA.

crete Tracer captures the initial state of CPU by capturing a copy of the CPU state before the first interested basic block is executed. The initial CPU state is normally a set of register values. As shown in Fig. 2, the initial CPU state is captured before instruction (1). Näively, the initial memory state can be captured in the same way; however, the typical size of memory makes it impractical to dump entirely. To minimize the trace size, crete Tracer only captures the parts of memory that are accessed by the captured read instructions, like instruction (1) and (9). The memory being touched by the captured write instructions, like instruction (3) and (11), can be ignored because the state of this part of the memory has been included in the write instructions and has been captured. As a result, crete Tracer monitors every memory read instruction that is of interest, capturing memory as needed on-the-fly. In the example above, there are two memory read instructions. crete Tracer monitors both of them, but only keeps the memory state taken from instruction (1) as a part of the initial state of memory, because instruction (1) and (9) access the same address.

The side effects of hardware states are captured by monitoring uncaptured write instructions of hardware states. In the example in Fig. 2, instructions (5) and (6) write CPU registers which cause side effects to the CPU state. crete Tracer monitors those instructions and keeps the updated register values as part of the runtime trace. As register r1 is updated twice by two instructions, only the last update is kept in the runtime trace. Similarly, crete Tracer captures the side effect of memory at address 0x5678 by monitoring instruction (7).

#### 4.4 Concolic Test Case Generation

While a standardized trace is a self-contained llvm module and can be directly executed by a llvm interpreter, it opens interfaces to see to inject symbolic values for test case generation. Normally see injects symbolic values by making a variable in source code symbolic. From source code level to machine code level, references of variables by names have become memory accesses by addresses. For instance, a reference of a concrete input variable of a program becomes a access of a piece of memory that stores the state of that input variable. crete injects self-defined helper function, crete\_make\_concolic, to the captured basic blocks while capturing trace. This helper function provides the address and size of the piece of memory for injecting symbolic values, along with a name to offer better readability for test case generation. By catching this helper function, see can introduce symbolic values at the right time and right place.

A standardized trace in crete represents only a single path of a tbp as shown in Fig. 3(a). Test case generation on this trace with näive symbolic execution by see won't be effective, as it ignores the single path nature of the trace. As illustrated in Fig. 3(b), native symbolic replay of crete trace produces execution states and test cases that are exponential to the number of branches within the trace. As shown in Fig. 3(c), with concolic replay of crete trace, the see in crete maintains only one execution state, requiring minimal memory usage, and generates a more compact set of test cases, whose number is linear to the number of branches in that trace. For a branch instruction in a captured basic

Fig. 3. Execution tree of the example trace from Fig. 2: (a) for concrete execution, (b) for symbolic execution, and (c) for concolic execution.

block, if both of the paths are feasible given the collected constraints so far on the symbolic values, the see in crete only keeps the execution state of the path that was taken by the original concrete execution in the vm by adding the corresponding constraints of this branch instruction, while generating a test case for the other path by resolving constraints with the negated branch condition. This generated test case can lead the tbp to a different execution path later during the concrete execution in the vm.

#### 4.5 Bug and Runtime Vulnerability Detection

crete detects bugs and runtime vulnerabilities in two ways. First, all the native checks embedded in see are checked during the symbolic replay over the trace captured from concrete execution. If there is a violation to a check, a bug report is generated and associated with the test case that is used in the vm to generate this trace. Second, since crete does not change the native testing process and simply provides additional test cases that can be applied in the native process, all the bugs and vulnerability checks that are used in the native process are effective in detecting bugs and vulnerabilities that can be triggered by the crete generated test cases. For instance, Valgrind [26] can be utilized to detect memory related bugs and vulnerabilities along the paths explored by crete test cases.

#### 5 Implementation

To demonstrate the practicality of crete, we have implemented its complete workflow with qemu [23] as the frontend and klee [5] as the backend respectively. And to demonstrate the extensibility of crete, we have also developed the tracing plug-in for the 8051 emulator which readily replaces qemu.

crete Tracer for qemu: To give crete the best potential of supporting various guest platforms supported by qemu, crete Tracer captures the basic blocks in the format of qemu-ir. To convert captured basic blocks into standardized trace format, we implemented a qemu-ir to llvm translator based on the x86 llvm translator of s2<sup>e</sup> [37]. We offload this translation from the runtime tracing as a separate offline process to reduce the runtime overhead of crete Tracer. qemu maintains its own virtual states to emulate physical hardware state of a guest platform. For example, it utilizes virtual memory state and virtual CPU state to emulate states of physical memory and CPU. Those virtual states of qemu are essentially source-level structs. crete Tracer captures hardware states by monitoring the runtime values of those structs maintained by qemu. qemu emulates the hardware operations by manipulating those virtual states through corresponding helper functions defined in qemu. crete Tracer captures the side effects on those virtual hardware states by monitoring the invocation of those helper functions. As a result, the initial hardware states being captured are the runtime values of these qemu structs, and the side effects being captured are the side effects on those structs from the uncaptured instructions.

crete Replayer for klee: klee takes as input the llvm modules compiled from C source code. As the crete trace is a self-contained llvm module, crete Replayer mainly injects symbolic values and achieves concolic test generation. To inject symbolic values, crete Replayer provides a special function handler for crete interface function crete\_make\_concolic. klee is an online symbolic executor natively, which forks execution states on each feasible branches and explores all execution paths by maintaining multiple execution states simultaneously. To achieve concolic test generation, crete Replayer extends klee to generate test cases only for feasible branches while not forking states.

crete Tracer for 8051 Emulator: The 8051 emulator executes a 8051 binary directly by interpreting its instructions sequentially. For each type of instruction, the emulator provides a helper function. Interpreting an instruction entails calling this function to compute and change the relevant registers and memory states. The tracing plug-in for the 8051 emulator extends the interpreter. When the interpreter executes an instruction, an llvm call to its corresponding helper function is put in the runtime trace. The 8051 instruction-processing helper functions are compiled into llvm and incorporated into the runtime trace serving as the helper functions that map the captured instructions to the captured runtime states. The initial runtime state is captured from the 8051 emulator before the first instruction is executed. The resulting trace is of the same format as that from qemu and is readily consumable by klee.

## 6 Evaluation

In this section, we present the evaluation results of crete from its application to GNU Coreutils [38] and TianoCore utility programs for UEFI BIOS [39]. Those evaluations demonstrate that crete generates effective test cases that are as effective in achieving high code coverage as the state-of-the-art tools for automated test case generation, and can detect serious deeply embedded bugs.

# 6.1 GNU Coreutils

Experiment Setup. GNU Coreutils is a package of utilities widely used in Unix-like systems. The <sup>87</sup> programs from Coreutils (version 6.10) contain 20*,* 559 lines of code, 988 functions, 14*,* 450 branches according to lcov [40]. The program size ranges from 18 to 1*,* 475 in lines, from 2 to 120 in functions, and from 6 to 1*,* 272 in branches. It is an often-used benchmark for evaluating automated program analysis systems, including klee, MergePoint and others [5,12,41]. This is why we chose it as the benchmark to compare with klee and angr.

crete and angr generates test cases from program binaries without debug information, while klee requires program source code. To measure and compare the effectiveness of test cases generated from different systems, we rerun those tests on the binaries compiled with coverage flag and calculate the code coverage with lcov. Note that we only calculate the coverage of the code in GNU Coreutils itself, and do not compute code coverage of the library code.

We adopted the configuration parameters for those programs from klee's experiment instructions<sup>1</sup>. As specified in the instructions, we ran klee on each program for one hour with a memory limit of 1 GB. We increased the memory limit to 8 GB for the experiment on angr, while using the same timeout of one hour. crete utilizes a different timeout strategy, which is defined by *no new instructions being covered in a given time-bound*. We set the timeout for crete as 15 min in this experiment. This timeout strategy was also used by DASE [41] for its evaluation on Coreutils. We conduct our experiments on an Intel Core i7-3770 3.40 GHz CPU desktop with 16 GB memory running 64-bit Ubuntu 14.04.5. We built klee from its release v1.3.0 with llvm 3.4, which was released on November 30, 2016. We built angr from its mainstream on Github at revision e7df250, which was committed on October 11, 2017. crete uses Ubuntu 12.04.5 as the guest OS for its vm front-end in our experiments.


Table 1. Comparison of overall and median coverage by klee, angr, and crete on Coreutils.

Comparison with klee and angr. As shown in Table 1, our experiments demonstrate that crete achieves comparable test coverage to klee and generally outperforms angr. The major advantage of klee over crete is that it works on source code with all semantics information available. When the program size is small, symbolic execution is capable of exploring all feasible paths

<sup>1</sup> http://klee.github.io/docs/coreutils-experiments/.


Table 2. Distribution comparison of coverage achieved by klee, angr, and crete on Coreutils.

with given resources, such as time and memory. This is why klee can achieve great code coverage, such as line coverage over 90%, on more programs than crete, as shown in Table 2. klee requires to maintain execution states for all paths being explored at once. This limitation becomes bigger when size of program gets bigger. What's more, klee analyzes programs within its own virtual environment with simplified model of real execution environment. Those models sometimes offer advantages to klee by reducing the complexity of the tbp, while sometimes they lead to disadvantages by introducing inaccurate environment. This is why crete gradually caught up in general as shown in Table 2. Specifically, crete gets higher line coverage on <sup>33</sup> programs, lower on <sup>31</sup> programs, and the same on other 23 programs. Figure 4(a) shows the coverage differences of crete over klee on all <sup>87</sup> Coreutils programs. Note that our coverage results for klee are different from klee's paper. As discussed and reported in previous works [12,41], the coverage differences are mainly due to the major code changes of klee, an architecture change from 32-bit to 64-bit, and whether manual system call failures are introduced.

angr shares the same limitation as klee requiring to maintain multiple states and provide models for execution environment, while it shares the disadvantage of crete in having no access to semantics information. Moreover, angr provides models of environment at machine level supporting various platforms, which is more challenging compared with klee's model. What's more, we found and reported several crashes of angr from this evaluation, which also affects the result of angr. This is why angr performs worse than both klee and crete in this experiment. Figure 4(b) shows the coverage differences of crete over angr on all <sup>87</sup> Coreutils programs. While crete outperformed angr on majority of the programs, there is one program printf that angr achieved over <sup>40</sup>% better line coverage than crete, as shown in the left most column in Fig. 4(b). We found the reason is printf uses many string routines from libc to parse inputs and angr provides effective models for those string routines. Similarly, klee works much better on printf than crete.

Fig. 4. Line coverage difference on Coreutils by crete over klee and angr: positive values mean crete is better, and negative values mean crete is worse.

Coverage Improvement over Seed Test Case. Since crete is a concolic testing framework, it needs an initial seed test case to start the test of a tbp. The goal of this experiment is to show that crete can significantly increase the coverage achieved by the seed test case that the user provides. To demonstrate the effectiveness of crete, we set the non-file argument, the content of the input file and the stdin to zeros as the seed test case. Of course, well-crafted test cases from the users would be more meaningful and effective to serve as the initial test cases. Figure 5 shows the coverage improvement of each program. On average, the initial seed test case covers 17*.*61% of lines, 29*.*55% of functions, and 11*.*11% of branches. crete improves the line coverage by <sup>56</sup>*.*71%, function coverage by 53*.*44%, and branch coverage by 52*.*14% respectively. The overall coverage improvement on all <sup>87</sup> Coreutils programs is significant.

Fig. 5. Coverage improvement over seed test case by crete on GNU Coreutils

Bug Detection. In our experiment on Coreutils, crete was able to detect all three bugs on mkdir, mkfifo, and mknod that were detected by klee. This demonstrates that crete does not sacrifice bug detection capacity while working directly on binaries without debug and high-level semantic information.

## 6.2 TianoCore Utilities

Experiment Setup. TianoCore utility programs are part of the open-source project EDK2 [42], a cross-platform firmware development environment from Intel. It includes 16 command-line programs used to build BIOS images. The TianoCore utility programs we evaluated are from its mainstream on Github at revision 75ce7ef committed on April 19, 2017. According to lcov, the 16 TianoCore utility programs contain 8*,* 086 lines of code, 209 functions, and 4*,* 404 branches. Note that we only calculate the coverage of the code for TianoCore utility programs themselves, and do not compute the coverage of libraries.

The configuration parameters we used on those utility programs are based on our rough high-level understanding of these programs from their user manuals. We assigned each program a long argument of 16 Bytes, and four short arguments of 2 Bytes, along with a file of 10 Kilobytes. We conduct our experiments on the same platform with the same host and guest OS as we did for the Coreutils evaluation, and set the timeout also as 15 min for each program.

High Coverage Test Generation From Scratch. For all the arguments and file contents in the parameter configuration, we set their initial value as binary zeros to serve as the seed test case of crete. Figure <sup>6</sup> shows that crete delivered high code coverage, above 80% line coverage, on 9 out of 16 programs. On average, the initial seed test case covers 14*.*56% of lines, 28*.*71% of functions, and <sup>12</sup>*.*38% of branches. crete improves the line coverage by <sup>43</sup>*.*61%, function coverage by 41*.*63%, and branch coverage by 44*.*63% respectively. Some programs got lower coverage because of: (1) inadequate configuration parameters; (2) error handling code triggered only by failed system calls; (3) symbolic indices for arrays and files not well handled by crete.

Fig. 6. Coverage improvement over seed test case by crete on TianoCore utilities

Bug Detection. To further demonstrate crete's capability in detecting deeply embedded bugs, we performed a set of evaluations focusing on concolic file with crete on TianoCore utility programs. From the build process of a tutorial image, OvmfPkg, from EDK2, we extracted 509 invocations to TianoCore utility programs and the corresponding intermediate files generated, among which 37 unique invocations cover 6 different programs. By taking parameter configurations from those <sup>37</sup> invocations and using their files as seed files, we ran crete with a timeout of 2 h on each setup, in which only files are made symbolic.


Table 3. Classified crashes found by crete on Tianocore utilities: 84 unique crashes from 8 programs

Combining experiments on concolic arguments and concolic files, crete found 84 distinct crashes (by stack hash) from eight TianoCore utility programs. We used a GDB extension [43] to classify the crashes, which is a popular way of classifying crashes for AFL users [44]. Table <sup>3</sup> shows that crete found various kinds of crashes including many exploitable ones, such as stack corruption, heap error, and write access violation. There are 8 crashes that are found with concolic arguments while the other 76 crashes are found with concolic files. We reported all those crashes to the TianoCore development team. So far, most of the crashes have been confirmed as real bugs, and ten of them have been fixed.

We now elaborate on a few sample crashes to demonstrate that the bugs found by crete are significant. VfrCompile crashed with a segmentation fault due to stack corruption when the input file name is malformed, e.g., '\\.%\*a' as generated by crete. This bug is essentially a format string exploit. VfrCompile uses function vsprintf() to compose a new string from a format string and store it in a local array with a fixed size. When the format string is malicious, like '%\*a', function vsprintf() will keep reading from the stack and the local buffer will be overflowed, hence causing a stack corruption. Note that crete generated a well-formed prefix for the input, '\\.', which is required to pass a preprocessing check from VfrCompile, so that the malicious format string can attack the vulnerable code.

crete also exposed several heap errors on GenFw by generating malformed input files. GenFw is used to generate a firmware image from an input file. The input file needs to follow a very precise file format, because GenFw checks the signature bytes to decide the input file type, uses complex nested structs to parse different sections of the file, and conducts many checks to ensure the input file is well-formed. Starting from a seed file of 223 Kilobyte extracted from EDK2's build process, crete automatically mutated <sup>29</sup> bytes in the file header. The mutated bytes introduced a particular combination of file signature and sizes and offsets of different sections of the file. This combination passed all checks on file format, and directed GenFw to a vulnerable function which mistakenly replaces the buffer already allocated for storing the input file with a much smaller buffer. Follow-up accesses of this malformed buffer caused overflow and heap corruption.

## 7 Conclusions and Future Work

In this paper, we have presented crete, a versatile binary-level concolic testing framework, which is designed to have an open and highly extensible architecture allowing easy integration of concrete execution frontends and symbolic execution backends. At the core of this architecture is a standardized format for binarylevel execution traces, which is llvm-based, self-contained, and composable. Standardized execution traces are captured by concrete execution frontends, providing succinct and sufficient information for symbolic execution backends to reproduce the concrete executions. We have implemented crete with klee as the symbolic execution engine and multiple concrete execution frontends such as qemu and 8051 Emulator. The evaluation of Coreutils programs shows that crete achieved comparable code coverage as klee directly analyzing the source code of Coreutils and generally outperformed angr. The evaluation of TianoCore utility programs found numerous exploitable bugs.

We are assembling a suite of 8051 binaries for evaluating crete and will report the results in the near future. Also as future work, we will develop new crete tracing plugins, e.g., for concrete execution on physical machines based on PIN. With these new plugins, we will focus on synthesizing abstract systemlevel traces from trace segments captured from binaries executing on various platforms. Another technical challenge that we plan to address is how to handle symbolic indices for arrays and files, so code coverage can be further improved.

Acknowledgment. This research received financial supports from National Science Foundation Grant #: CNS-1422067, Semiconductor Research Corporation Contract #: 2708.001, and gifts from Intel Corporation.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Family-Based Software Development

## **Abstract Family-Based Model Checking Using Modal Featured Transition Systems: Preservation of CTL***-*

Aleksandar S. Dimovski(B)

Faculty of Informatics, Mother Teresa University, Skopje, Republic of Macedonia aleksandar.dimovski@unt.edu.mk

**Abstract.** Variational systems allow effective building of many custom variants by using features (configuration options) to mark the variable functionality. In many of the applications, their quality assurance and formal verification are of paramount importance. Family-based model checking allows simultaneous verification of all variants of a variational system in a single run by exploiting the commonalities between the variants. Yet, its computational cost still greatly depends on the number of variants (often huge).

In this work, we show how to achieve efficient family-based model checking of CTL temporal properties using variability abstractions and off-the-shelf (single-system) tools. We use variability abstractions for deriving abstract family-based model checking, where the variability model of a variational system is replaced with an abstract (smaller) version of it, called *modal featured transition system*, which preserves the satisfaction of both universal and existential temporal properties, as expressible in CTL-. Modal featured transition systems contain two kinds of transitions, termed may and must transitions, which are defined by the conservative (over-approximating) abstractions and their dual (under-approximating) abstractions, respectively. The variability abstractions can be combined with different partitionings of the set of variants to infer suitable divide-and-conquer verification plans for the variational system. We illustrate the practicality of this approach for several variational systems.

## **1 Introduction**

Variational systems appear in many application areas and for many reasons. Efficient methods to achieve customization, such as *Software Product Line Engineering* (SPLE) [8], use *features* (configuration options) to control presence and absence of the variable functionality [1]. Family members, called *variants* of a *variational system*, are specified in terms of features selected for that particular variant. The reuse of code common to multiple variants is maximized. The SPLE method is particularly popular in the embedded and critical system domain (e.g. cars, phones). In these domains, a rigorous verification and analysis is very important. Among the methods included in current practices, *model checking* [2] is a well-studied technique used to establish that temporal logic properties hold for a system.

Variability and SPLE are major enablers, but also a source of complexity. Obviously, the size of the configuration space (number of variants) is the limiting factor to the feasibility of any verification technique. Exponentially many variants can be derived from few configuration options. This problem is referred to as *the configuration space explosion* problem. A simple "brute-force" application of a single-system model checker to each variant is infeasible for realistic variational systems, due to the sheer number of variants. This is very ineffective also because the same execution behavior is checked multiple times, whenever it is shared by some variants. Another, more efficient, verification technique [5,6] is based on using compact representations for modelling variational systems, which incorporate the commonality within the family. We will call these representations variability models (or featured transition systems). Each behavior in a variability model is associated with the set of variants able to produce it. A specialized family-based model checking algorithm executed on such a model, checks an execution behavior only once regardless of how many variants include it. These algorithms model check all variants simultaneously in a single run and pinpoint the variants that violate properties. Unfortunately, their performance *still* heavily depends on the size and complexity of the configuration space of the analyzed variational system. Moreover, maintaining specialized family-based tools is also an expensive task.

In order to address these challenges, we propose to use standard, singlesystem model checkers with an alternative, externalized way to combat the configuration space explosion. We apply the so-called *variability abstractions* to a variability model which is too large to handle ("configuration space explosion"), producing a more *abstract model*, which is smaller than the original one. We abstract from certain aspects of the configuration space, so that many of the configurations (variants) become indistinguishable and can be collapsed into a single abstract configuration. The abstract model is constructed in such a way that if some property holds for this abstract model it will also hold for the concrete model. Our technique extends the scope of existing over-approximating variability abstractions [14,19] which currently support the verification of universal properties only (LTL and ∀CTL). Here we construct abstract variability models which can be used to check arbitrary formulae of CTL-, thus including arbitrary nested path quantifiers. We use modal featured transition systems (MFTSs) for representing abstract variability models. MFTSs are featured transition systems (FTSs) with two kinds of transitions, *must* and *may*, expressing behaviours that necessarily occur (must) or possibly occur (may). We use the standard conservative (over-approximating) abstractions to define may transitions, and their dual (under-approximating) abstractions to define must transitions. Therefore, MFTSs perform both over- and under-approximation, admitting both universal and existential properties to be deduced. Since MFTSs preserve all CTL-

properties, we can verify any such properties on the concrete variability model (which is given as an FTSs) by verifying these on an abstract MFTS. Any model checking problem on modal transitions systems (resp., MFTSs) can be reduced to two traditional model checking problems on standard transition systems (resp., FTSs). The overall technique relies on partitioning and abstracting concrete FTSs, until the point we obtain models with so limited variability (or, no variability) that it is feasible to complete their model checking in the bruteforce fashion using the standard single-system model checkers. Compared to the family-based model checking, experiments show that the proposed technique achieves performance gains.

#### **2 Background**

In this section, we present the background used in later developments.

*Modal Featured Transition Systems.* Let <sup>F</sup> <sup>=</sup> {A1,...,An} be a finite set of Boolean variables representing the features available in a variational system. A specific subset of features, <sup>k</sup> <sup>⊆</sup> <sup>F</sup>, known as *configuration*, specifies a *variant* (valid product) of a variational system. We assume that only a subset <sup>K</sup> <sup>⊆</sup> <sup>2</sup><sup>F</sup> of configurations are *valid*. An alternative representation of configurations is based upon propositional formulae. Each configuration <sup>k</sup> <sup>∈</sup> <sup>K</sup> can be represented by a formula: <sup>k</sup>(A1) <sup>∧</sup> ... <sup>∧</sup> <sup>k</sup>(An), where <sup>k</sup>(Ai) = <sup>A</sup><sup>i</sup> if <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>k</sup>, and <sup>k</sup>(Ai) = <sup>¬</sup>A<sup>i</sup> if <sup>A</sup><sup>i</sup> <sup>∈</sup>/ <sup>k</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. We will use both representations interchangeably.

We recall the basic definition of a transition system (TS) and a modal transition system (MTS) that we will use to describe behaviors of single-systems.

**Definition 1.** *A transition system (TS) is a tuple* <sup>T</sup> = (S, Act, trans, I, AP, L)*, where* <sup>S</sup> *is a set of states;* Act *is a set of actions;* trans <sup>⊆</sup> <sup>S</sup>×Act×<sup>S</sup> *is a transition relation;* <sup>I</sup> <sup>⊆</sup> <sup>S</sup> *is a set of initial states;* AP *is a set of atomic propositions; and* <sup>L</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>AP *is a labelling function specifying which propositions hold in a state. We write* s<sup>1</sup> λ −−→s<sup>2</sup> *whenever* (s1, λ, s2) <sup>∈</sup> *trans.*

An *execution* (behaviour) of a TS <sup>T</sup> is an *infinite* sequence <sup>ρ</sup> <sup>=</sup> <sup>s</sup>0λ1s1λ<sup>2</sup> ... with <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>I</sup> such that <sup>s</sup><sup>i</sup> λ*i*+1 −→ <sup>s</sup>i+1 for all <sup>i</sup> <sup>≥</sup> 0. The *semantics* of the TS <sup>T</sup> , denoted as [[T ]]T S, is the set of its executions.

MTSs [26] are a generalization of transition systems that allows describing not just a sum of all behaviors of a system but also an over- and underapproximation of the system's behaviors. An MTS is a TS equipped with two transition relations: *must* and *may*. The former (must) is used to specify the required behavior, while the latter (may) to specify the allowed behavior of a system.

**Definition 2.** *A modal transition system (MTS) is represented by a tuple* M = (S, Act, transmay, transmust, I, AP, L)*, where* transmay <sup>⊆</sup> <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> *describe may transitions of* <sup>M</sup>*;* transmust <sup>⊆</sup> <sup>S</sup> <sup>×</sup>Act×<sup>S</sup> *describe must transitions of* <sup>M</sup>*, such that* transmust <sup>⊆</sup> transmay*.*

The intuition behind the inclusion transmust <sup>⊆</sup> transmay is that transitions that are necessarily true (transmust) are also possibly true (transmay). A *mayexecution* in <sup>M</sup> is an execution with all its transitions in transmay; whereas a *must-execution* in <sup>M</sup> is an execution with all its transitions in transmust. We use [[M]]may MTS to denote the set of all may-executions in <sup>M</sup>, whereas [[M]]must MTS to denote the set of all must-executions in M.

An FTS describes behavior of a whole family of systems in a *superimposed* manner. This means that it combines models of many variants in a single monolithic description, where the transitions are guarded by a *presence condition* that identifies the variants they belong to. The presence conditions ψ are drawn from the set of feature expressions, *FeatExp*(F), which are propositional logic formulae over <sup>F</sup>: <sup>ψ</sup> ::= *true* <sup>|</sup> <sup>A</sup> <sup>∈</sup> <sup>F</sup> | ¬<sup>ψ</sup> <sup>|</sup> <sup>ψ</sup>1∧ψ2. The presence condition <sup>ψ</sup> of a transition specifies the variants in which the transition is enabled. We write [[ψ]] to denote the set of variants from <sup>K</sup> that satisfy <sup>ψ</sup>, i.e. <sup>k</sup> <sup>∈</sup> [[ψ]] iff <sup>k</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>.

**Definition 3.** *A featured transition system (FTS) represents a tuple* F = (S, Act, trans, I, AP, L, F, K, δ)*, where* S, Act, trans, I, AP*, and* L *are defined as in TS;* F *is the set of available features;* K *is a set of valid configurations; and* <sup>δ</sup> : trans <sup>→</sup> F eatExp(F) *is a total function decorating transitions with presence conditions (feature expressions).*

The *projection* of an FTS <sup>F</sup> to a variant <sup>k</sup> <sup>∈</sup> <sup>K</sup>, denoted as <sup>π</sup>k(F), is the TS (S, Act, trans , I, AP, L), where trans <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> <sup>k</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>(t)}. We lift the definition of *projection* to sets of configurations <sup>K</sup> <sup>⊆</sup> <sup>K</sup>, denoted as <sup>π</sup><sup>K</sup>- (F), by keeping the transitions admitted by at least one of the configurations in K . That is, π<sup>K</sup>- (F), is the FTS (S, Act, trans , I, AP, L, F, K , δ), where trans <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans | ∃<sup>k</sup> <sup>∈</sup> <sup>K</sup> .k <sup>|</sup><sup>=</sup> <sup>δ</sup>(t)}. The *semantics* of an FTS <sup>F</sup>, denoted as [[F]]FTS, is the union of behaviours of the projections on all valid variants <sup>k</sup> <sup>∈</sup> <sup>K</sup>, i.e. [[F]]FTS <sup>=</sup> <sup>∪</sup><sup>k</sup>∈<sup>K</sup>[[πk(F)]]T S.

We will use modal featured transition systems (MFTS) for representing abstractions of FTSs. MFTSs are variability-aware extension of MTSs.

**Definition 4.** *A modal featured transition system (MFTS) represents a tuple* MF = (S, Act, transmay, transmust, I, AP, L, <sup>F</sup>, <sup>K</sup>, δmay, δmust)*, where* transmay *and* <sup>δ</sup>may : transmay <sup>→</sup> F eatExp(F) *describe may transitions of* MF*;* transmust *and* <sup>δ</sup>must : transmust <sup>→</sup> F eatExp(F) *describe must transitions of* MF*.*

The *projection* of an MFTS MF to a variant <sup>k</sup> <sup>∈</sup> <sup>K</sup>, denoted as <sup>π</sup>k(MF), is the MTS (S, Act, transmay, transmust, I, AP, L), where transmay <sup>=</sup> {<sup>t</sup> <sup>∈</sup> transmay <sup>|</sup> <sup>k</sup> <sup>|</sup>=δmay(t)}, transmust <sup>=</sup> {t∈transmust <sup>|</sup> <sup>k</sup> <sup>|</sup>=δmust(t)}. We define [[MF]]may MFTS <sup>=</sup> <sup>∪</sup><sup>k</sup>∈<sup>K</sup>[[πk(MF)]]may MTS, and [[MF]]must MFTS <sup>=</sup> <sup>∪</sup><sup>k</sup>∈<sup>K</sup>[[πk(MF)]]must MTS.

*Example 1.* Throughout this paper, we will use a beverage vending machine as a running example [6]. Figure 1 shows the FTS of a VendingMachine family. It has five features, and each of them is assigned an identifying letter and a color. The features are: VendingMachine (denoted by letter v, in black), the mandatory base feature of purchasing a drink, present in all variants; Tea (t, in red), for

**Fig. 1.** The FTS for VendingMachine. (Color figure online)

**Fig. 2.** <sup>π</sup>{v,s}(VendingMachine)

serving tea; Soda (s, in green), for serving soda, which is a mandatory feature present in all variants; CancelPurchase (c, in brown), for canceling a purchase after a coin is entered; and FreeDrinks (f, in blue) for offering free drinks. Each transition is labeled by an *action* followed by a *feature expression*. For instance, the transition <sup>1</sup> *free*/f −−−→ <sup>3</sup> is included in variants where the feature <sup>f</sup> is enabled.

By combining various features, a number of variants of this VendingMachine can be obtained. Recall that v and s are mandatory features. The set of valid configurations is thus: <sup>K</sup>VM <sup>=</sup> {{v, <sup>s</sup>}, {v, s,t}, {v, s, <sup>c</sup>}, {v, s,t, <sup>c</sup>}, {v, s, <sup>f</sup>}, {v, s,t, <sup>f</sup>}, {v, s, c, <sup>f</sup>}, {v, s,t, c, <sup>f</sup>}}. In Fig. <sup>2</sup> is shown the basic version of VendingMachine that only serves soda, which is described by the configuration: {v, s} (or, as formula <sup>v</sup> <sup>∧</sup><sup>s</sup> ∧ ¬<sup>t</sup> ∧ ¬<sup>c</sup> ∧ ¬f), that is the projection <sup>π</sup>{v,s}(VendingMachine). It takes a coin, returns change, serves soda, opens a compartment so that the customer can take the soda, before closing it again.

Figure 3 shows an MTS. Must transitions are denoted by solid lines, while may transitions by dashed lines. 

*CTL*- *Properties.* Computation Tree Logic- (CTL-) [2] is an expressive temporal logic for specifying system properties, which subsumes both CTL and LTL logics. CTLstate formulae Φ are generated by the following grammar:

$$\Phi ::= true \mid a \in AP \mid \neg a \mid \Phi\_1 \land \Phi\_2 \mid \forall \phi \mid \exists \phi, \qquad \phi ::= \Phi \mid \phi\_1 \land \phi\_2 \mid \bigcirc \phi \mid \phi\_1 \mathsf{U} \phi\_2$$

where φ represent CTL path formulae. Note that the CTL state formulae Φ are given in negation normal form (¬ is applied only to atomic propositions). Given <sup>Φ</sup> <sup>∈</sup> CTL-, we consider <sup>¬</sup><sup>Φ</sup> to be the equivalent CTL formula given in negation normal form. Other derived temporal operators (path formulae) can be defined as well by means of syntactic sugar, for instance: ♦φ = trueUφ (φ holds eventually), and <sup>φ</sup> <sup>=</sup> ¬∀♦¬<sup>φ</sup> (<sup>φ</sup> always holds). <sup>∀</sup>CTL and <sup>∃</sup>CTL are subsets of CTLwhere the only allowed path quantifiers are ∀ and ∃, respectively.

We formalise the semantics of CTL over a TS <sup>T</sup> . We write [[<sup>T</sup> ]]<sup>s</sup> TS for the set of executions that start in state s; ρ[i] = s<sup>i</sup> to denote the i-th state of the execution ρ; and ρ<sup>i</sup> = siλi+1si+1 ... for the suffix of ρ starting from its i-th state.

**Definition 5.** *Satisfaction of a state formula* <sup>Φ</sup> *in a state* <sup>s</sup> *of a TS* <sup>T</sup> *, denoted* <sup>T</sup> , s <sup>|</sup><sup>=</sup> <sup>φ</sup>*, is defined as (*<sup>T</sup> *is omitted when clear from context):*

**(1)** <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>a</sup> *iff* <sup>a</sup> <sup>∈</sup> <sup>L</sup>(s)*;* <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>¬</sup><sup>a</sup> *iff* a /<sup>∈</sup> <sup>L</sup>(s)*,* **(2)** <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>1</sup> <sup>∧</sup> <sup>Φ</sup><sup>2</sup> *iff* <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>1</sup> *and* <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>2*,* **(3)** <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>∀</sup><sup>φ</sup> *iff* <sup>∀</sup><sup>ρ</sup> <sup>∈</sup> [[<sup>T</sup> ]]<sup>s</sup> *TS*. ρ <sup>|</sup><sup>=</sup> <sup>φ</sup>*;* <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup><sup>φ</sup> *iff* <sup>∃</sup><sup>ρ</sup> <sup>∈</sup> [[<sup>T</sup> ]]<sup>s</sup> *TS*. ρ <sup>|</sup><sup>=</sup> <sup>φ</sup>

*Satisfaction of a path formula* <sup>φ</sup> *for an execution* <sup>ρ</sup> *of a TS* <sup>T</sup> *, denoted* <sup>T</sup> , ρ <sup>|</sup><sup>=</sup> <sup>φ</sup>*, is defined as (*T *is omitted when clear from context):*


**Definition 6.** *An FTS* <sup>F</sup> *satisfies a CTL formula* <sup>Φ</sup>*, written* F |<sup>=</sup> <sup>Φ</sup>*, iff all its valid variants satisfy the formula:* <sup>∀</sup>k∈K. πk(F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>*.*

The interpretation of CTL over an MTS M is defined slightly different from the above Definition 5. In particular, the clause (3) is replaced by:

**(3')** <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>∀</sup><sup>φ</sup> iff for every may-execution <sup>ρ</sup> in the state <sup>s</sup> of <sup>M</sup>, that is <sup>∀</sup><sup>ρ</sup> <sup>∈</sup> [[M]]may,s MTS , it holds <sup>ρ</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>; whereas <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup><sup>φ</sup> iff there exists a must-execution <sup>ρ</sup> in the state <sup>s</sup> of <sup>M</sup>, that is <sup>∃</sup><sup>ρ</sup> <sup>∈</sup> [[M]]must,s MTS , such that <sup>ρ</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>.

From now on, we implicitly assume this adapted definition when interpreting CTLformulae over MTSs and MFTSs.

*Example 2.* Consider the FTS VendingMachine in Fig. 1. Suppose that the proposition start holds in the initial state <sup>1</sup> . An example property <sup>Φ</sup><sup>1</sup> is: ∀ ∀♦start, which states that in every state along every execution all possible continuations will eventually reach the initial state. This formula is in <sup>∀</sup>CTL-. Note that VendingMachine |<sup>=</sup> <sup>Φ</sup>1. For example, if the feature <sup>c</sup> (Cancel) is enabled, a counter-example where the state <sup>1</sup> is never reached is: <sup>1</sup> → <sup>3</sup> → <sup>5</sup> <sup>→</sup> <sup>7</sup> <sup>→</sup> <sup>3</sup> <sup>→</sup> .... The set of violating products is [[c]] ={{v, s, <sup>c</sup>}, {v, s,t, <sup>c</sup>}, {v, s, c, <sup>f</sup>}, {v, s,t, c, <sup>f</sup>}} ⊆ <sup>K</sup>VM. However, <sup>π</sup>[[¬c]](VendingMachine) <sup>|</sup><sup>=</sup> <sup>Φ</sup>1.

Consider the property <sup>Φ</sup>2: <sup>∀</sup> <sup>∃</sup>♦start, which describes a situation where in every state along every execution there exists a possible continuation that will eventually reach the start state. This is a CTL formula, which is neither in <sup>∀</sup>CTL nor in <sup>∃</sup>CTL-. Note that VendingMachine <sup>|</sup><sup>=</sup> <sup>Φ</sup>2, since even for variants with the feature <sup>c</sup> there is a continuation from the state <sup>3</sup> back to <sup>1</sup> .

Consider the <sup>∃</sup>CTL property <sup>Φ</sup>3: <sup>∃</sup> <sup>∃</sup>♦start, which states that there exists an execution such that in every state along it there exists a possible continuation that will eventually reach the start state. The witnesses are <sup>1</sup> → <sup>2</sup> → <sup>3</sup> → <sup>5</sup> <sup>→</sup> <sup>7</sup> <sup>→</sup> <sup>8</sup> <sup>→</sup> <sup>1</sup> ... for variants that satisfy <sup>¬</sup>c, and <sup>1</sup> <sup>→</sup> <sup>3</sup> <sup>→</sup> <sup>5</sup> <sup>→</sup> <sup>7</sup> <sup>→</sup> <sup>3</sup> <sup>→</sup> <sup>4</sup> <sup>→</sup> <sup>1</sup> ... for variants with <sup>c</sup>. 

## **3 Abstraction of FTSs**

We now introduce the variability abstractions which preserve full CTL and its universal and existential properties. They simplify the configuration space of an FTSs, by reducing the number of configurations and manipulating presence conditions of transitions. We start working with Galois connections<sup>1</sup> between Boolean complete lattices of feature expressions, and then induce a notion of abstraction of FTSs. We define two classes of abstractions. We use the standard conservative abstractions [14,15] as an instrument to eliminate variability from the FTS in an *over-approximating* way, so by adding more executions. We use the dual abstractions, which can also eliminate variability but through *underapproximating* the given FTS, so by dropping executions.

*Domains.* The Boolean complete lattice of feature expressions (propositional formulae over <sup>F</sup>) is: (*FeatExp*(F)/≡, <sup>|</sup>=,∨,∧,*true*, *false*,¬). The elements of the domain *FeatExp*(F)/<sup>≡</sup> are equivalence classes of propositional formulae <sup>ψ</sup> <sup>∈</sup> *FeatExp*(F) obtained by quotienting by the semantic equivalence <sup>≡</sup>. The ordering |= is the standard entailment between propositional logics formulae, whereas the least upper bound and the greatest lower bound are just logical disjunction and conjunction respectively. Finally, the constant *false* is the least, *true* is the greatest element, and negation is the complement operator.

*Conservative Abstractions.* The *join abstraction*, *α*join, merges the controlflow of all variants, obtaining a single variant that includes all executions occurring in any variant. The information about which transitions are associated with which variants is lost. Each feature expression ψ is replaced with *true* if there exists at least one configuration from K that satisfies ψ. The new abstract set of features is empty: *<sup>α</sup>*join(F) = <sup>∅</sup>, and the abstract set of valid configurations is a singleton: *<sup>α</sup>*join(K) = {*true*} if <sup>K</sup> <sup>=</sup> <sup>∅</sup>. The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(∅), forming a Galois connection [14,15], are defined as:

$$\mathbf{\alpha}^{\text{join}}(\psi) = \begin{cases} true & \text{if } \exists k \in \mathbb{K}. k \mid = \psi \\ false & \text{otherwise} \end{cases} \qquad \gamma^{\text{join}}(\psi) = \begin{cases} true & \text{if } \psi \text{ is } true \\ \bigvee\_{k \in 2^{\mathbb{F}} \backslash \mathbb{K}} k & \text{if } \psi \text{ is } false \end{cases}$$

The *feature ignore abstraction*, *α*fignore <sup>A</sup> , introduces an over-approximation by ignoring a single feature <sup>A</sup>∈F. It merges the control flow paths that only differ with regard to A, but keeps the precision with respect to control flow paths that do not depend on A. The features and configurations of the abstracted model are: *α*fignore <sup>A</sup> (F) = <sup>F</sup>\{A}, and *<sup>α</sup>*fignore <sup>A</sup> (K) = {k[l<sup>A</sup> → *true*] <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup>}, where <sup>l</sup><sup>A</sup> denotes a literal of <sup>A</sup> (either <sup>A</sup> or <sup>¬</sup>A), and <sup>k</sup>[l<sup>A</sup> → *true*] is a formula resulting from <sup>k</sup> by

<sup>1</sup> -L, <sup>≤</sup>L <sup>−</sup> ←−−−−→− α γ -M, <sup>≤</sup>M is a *Galois connection* between complete lattices <sup>L</sup> (concrete domain) and M (abstract domain) iff α : L → M and γ : M → L are total functions that satisfy: α(l) ≤<sup>M</sup> m ⇐⇒ l ≤<sup>L</sup> γ(m) for all l ∈ L, m ∈ M. Here <sup>L</sup> and <sup>M</sup> are the pre-order relations for L and M, respectively. We will often simply write (α, γ) for any such Galois connection.

substituting *true* for lA. The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(*α*fignore <sup>A</sup> (F)), forming a Galois connection [14,15], are:

$$\alpha\_A^{\text{figure}}(\psi) = \psi[l\_A \mapsto true] \qquad \gamma\_A^{\text{figure}}(\psi') = (\psi' \wedge A) \vee (\psi' \wedge \neg A)$$

where ψ and ψ need to be in negation normal form before substitution.

*Dual Abstractions.* Suppose that*FeatExp*(F)/≡, <sup>|</sup>=,*FeatExp*(α(F))/≡, <sup>|</sup>= are Boolean complete lattices, and *FeatExp*(F)/≡, <sup>|</sup>= <sup>−</sup> ←−− →− α γ *FeatExp*(α(F))/≡, <sup>|</sup>= is a Galois connection. We define [9]: <sup>α</sup> <sup>=</sup> ¬ ◦ <sup>α</sup> ◦ ¬ and <sup>γ</sup> <sup>=</sup> ¬ ◦ <sup>γ</sup> ◦ ¬ so that *FeatExp*(F)/≡, <sup>−</sup> ←−− →− α- γ- *FeatExp*(α(F))/≡, is a Galois connection (or equivalently, *FeatExp*(α(F))/≡, <sup>|</sup>= <sup>−</sup> ←−− →− γ- α- *FeatExp*(F)/≡, <sup>|</sup>=). The obtained Galois connections (α, <sup>γ</sup>) are called dual (under-approximating) abstractions of (α, γ).

The *dual join abstraction*, *α* join, merges the control-flow of all variants, obtaining a single variant that includes only those executions that occur in all variants. Each feature expression ψ is replaced with *true* if all configurations from K satisfy ψ. The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(∅), forming a Galois connection, are defined as: *α* join <sup>=</sup> ¬ ◦*α*join ◦ and *γ* join <sup>=</sup> ¬ ◦ *<sup>γ</sup>*join ◦ ¬, that is:

$$\widetilde{\alpha \alpha^{\text{lòn}}}(\psi) = \begin{cases} true & \text{if } \forall k \in \mathbb{K}. k \equiv \psi \\ false & \text{otherwise} \end{cases} \qquad \widetilde{\gamma^{\text{jòn}}}(\psi) = \begin{cases} \bigwedge\_{k \in 2^{\mathbb{F}} \backslash \mathbb{K}} (\neg k) & \text{if } \psi \text{ is } true \\ false & \text{if } \psi \text{ is } false \end{cases}$$

The *dual feature ignore abstraction*, *α* fignore <sup>A</sup> , introduces an underapproximation by ignoring the feature <sup>A</sup> <sup>∈</sup> <sup>F</sup>, such that the literals of <sup>A</sup> (that is, <sup>A</sup> and <sup>¬</sup>A) are replaced with *false* in feature expressions (given in negation normal form). The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(*α*fignore <sup>A</sup> (F)), forming a Galois connection, are defined as: *α* fignore <sup>A</sup> <sup>=</sup> ¬ ◦ *<sup>α</sup>*fignore <sup>A</sup> ◦ ¬ and *γ* fignore <sup>A</sup> <sup>=</sup> ¬ ◦ *<sup>γ</sup>*fignore <sup>A</sup> ◦ ¬, that is: *α* fignore <sup>A</sup> (ψ) = <sup>ψ</sup>[l<sup>A</sup> → *false*] *<sup>γ</sup>* fignore <sup>A</sup> (ψ )=(ψ ∨ ¬A) <sup>∧</sup> (ψ <sup>∨</sup> <sup>A</sup>)

where ψ and ψ are in negation normal form.

*Abstract MFTS and Preservation of CTL*-**.** Given a Galois connection (α, γ) defined on the level of feature expressions, we now define the abstraction of an FTS as an MFTS with two transition relations: one (may) preserving universal properties, and the other (must) existential properties. The may transitions describe the behaviour that is possible, but not need be realized in the variants of the family; whereas the must transitions describe behaviour that has to be present in any variant of the family.

**Definition 7.** *Given the FTS* <sup>F</sup> = (S, Act, trans, I, AP, L, <sup>F</sup>, <sup>K</sup>, δ)*, we define the MFTS* <sup>α</sup>(F)=(S, Act, transmay, transmust, I, AP, L, α(F), α(K), δmay, δmust)*to be its abstraction, where* <sup>δ</sup>may(t) = <sup>α</sup>(δ(t))*,* <sup>δ</sup>must(t) = <sup>α</sup>(δ(t))*,* transmay <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> <sup>δ</sup>may(t) <sup>=</sup> *false*}*, and* transmust <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> <sup>δ</sup>must(t) <sup>=</sup> *false*}*.*

Note that the degree of reduction is determined by the choice of abstraction and may hence be arbitrary large. In the extreme case of join abstraction, we obtain an abstract model with no variability in it, that is *<sup>α</sup>*join(F) is an ordinary MTS.

*Example 3.* Recall the FTS VendingMachine of Fig. 1 with the set of valid configurations KVM (see Example 1). Figure 3 shows *α*join(VendingMachine), where the allowed (may) part of the behavior includes the transitions that are associated with the optional features c, f, t in VendingMachine, whereas the required (must) part includes the transitions associated with the mandatory features v and s. Note that *α*join(VendingMachine) is an ordinary MTS with no variability. The MFTS *α*fignore {t,f} (π[[<sup>v</sup> <sup>∧</sup> <sup>s</sup>]](VendingMachine)) is shown in [12, Appendix B], see Fig. 8. It has the singleton set of features <sup>F</sup> <sup>=</sup> {c} and limited variability <sup>K</sup> <sup>=</sup> {c,¬c}, where the mandatory features <sup>v</sup> and <sup>s</sup> are enabled. 

From the MFTS (resp., MTS) MF, we define two FTSs (resp., TSs) MFmay and MFmust representing the may- and must-components of MF, i.e. its may and must transitions, respectively. Thus, we have [[MFmay]]FTS = [[MF]]may MFTS and [[MFmust]]FTS = [[MF]]must MFTS.

We now show that the abstraction of an FTS is sound with respect to CTL-. First, we show two helper lemmas stating that: for any variant <sup>k</sup> <sup>∈</sup> <sup>K</sup> that can execute a behavior, there exists an abstract variant <sup>k</sup> <sup>∈</sup>α(K) that executes the same may-behaviour; and for any abstract variant <sup>k</sup> <sup>∈</sup>α(K) that can execute a must-behavior, there exists a variant <sup>k</sup>∈<sup>K</sup> that executes the same behaviour<sup>2</sup>.

**Lemma 1.** *Let* <sup>ψ</sup> <sup>∈</sup> *FeatExp*(F)*, and* <sup>K</sup> *be a set of valid configurations over* <sup>F</sup>*.*


#### **Lemma 2**


As a result, every <sup>∀</sup>CTL- (resp., <sup>∃</sup>CTL-) property true for the may- (resp., must-) component of <sup>α</sup>(F) is true for <sup>F</sup> as well. Moreover, the MFTS <sup>α</sup>(F) preserves the full CTL-.

**Theorem 1 (Preservation results).** *For any FTS* <sup>F</sup> *and* (α, γ)*, we have:*

**(**∀**CTL**-**)** *For every* <sup>Φ</sup> ∈ ∀CTL-*,* <sup>α</sup>(F)may <sup>|</sup><sup>=</sup> <sup>Φ</sup> <sup>=</sup>⇒ F|<sup>=</sup> <sup>Φ</sup>*.* **(**∃**CTL**-**)** *For every* <sup>Φ</sup> ∈ ∃CTL-*,* <sup>α</sup>(F)must <sup>|</sup><sup>=</sup> <sup>Φ</sup> <sup>=</sup>⇒ F|<sup>=</sup> <sup>Φ</sup>*.* **(CTL**-**)** *For every* <sup>Φ</sup> <sup>∈</sup> CTL-*,* <sup>α</sup>(F) <sup>|</sup><sup>=</sup> <sup>Φ</sup> <sup>=</sup>⇒ F|<sup>=</sup> <sup>Φ</sup>*.*

<sup>2</sup> Proofs of all lemmas and theorems in this section can be found in [12, Appendix A].

Abstract models are designed to be conservative for the satisfaction of properties. However, in case of the refutation of a property, a counter-example is found in the abstract model which may be spurious (introduced due to abstraction) for some variants and genuine for the others. This can be established by checking which variants can execute the found counter-example.

**Fig. 3.** *α*join(VendingMachine).

Let Φ be a CTL formula which is not in <sup>∀</sup>CTL nor in <sup>∃</sup>CTL-, and let MF be an MFTS. We verify MF |<sup>=</sup> <sup>Φ</sup> by checking <sup>Φ</sup> on two FTSs MFmay and MFmust, and then we combine the obtained results as specified below.

**Theorem 2.** *For every* <sup>Φ</sup> <sup>∈</sup> CTL*and MFTS* MF*, we have:*

$$\mathcal{M}\mathcal{F} \rfloor = \Phi = \begin{cases} true & \text{if } \left(\mathcal{M}\mathcal{F}^{may} \middle| = \Phi \land \mathcal{M}\mathcal{F}^{must} \middle| = \Phi\right) \\ false & \text{if } \left(\mathcal{M}\mathcal{F}^{may} \middle| \neq \Phi \lor \mathcal{M}\mathcal{F}^{must} \middle| \neq \Phi\right) \end{cases}$$

Therefore, we can check a formula <sup>Φ</sup> which is not in <sup>∀</sup>CTL nor in <sup>∃</sup>CTL on <sup>α</sup>(F) by running a model checker twice, once with the may-component of <sup>α</sup>(F) and once with the must-component of <sup>α</sup>(F). On the other hand, a formula <sup>Φ</sup> from <sup>∀</sup>CTL- (resp., <sup>∃</sup>CTL-) on <sup>α</sup>(F) is checked by running a model checker only once with the may-component (resp., must-component) of <sup>α</sup>(F).

The family-based model checking problem can be reduced to a number of smaller problems by partitioning the set of variants. Let the subsets <sup>K</sup>1, <sup>K</sup>2,..., <sup>K</sup><sup>n</sup> form a *partition* of the set <sup>K</sup>. Then: F |<sup>=</sup> <sup>Φ</sup> iff <sup>π</sup><sup>K</sup>*<sup>i</sup>* (F) <sup>|</sup><sup>=</sup> <sup>Φ</sup> for all i = 1,...,n. By using Theorem 1 (CTL-), we obtain the following result.

**Corollary 1.** *Let* K1, K2,..., K<sup>n</sup> *form a* partition *of* K*, and* (α1,γ1),..., (αn,γn) *be Galois connections. If* <sup>α</sup>1(π<sup>K</sup><sup>1</sup> (F)) <sup>|</sup><sup>=</sup> Φ, . . . , αn(π<sup>K</sup>*<sup>n</sup>* (F)) <sup>|</sup><sup>=</sup> <sup>Φ</sup>*, then* F |<sup>=</sup> <sup>Φ</sup>*.*

Therefore, in case of suitable partitioning of K and the aggressive *α*join abstraction, all *<sup>α</sup>*join(π<sup>K</sup>*<sup>i</sup>* (F))may and *<sup>α</sup>*join(π<sup>K</sup>*<sup>i</sup>* (F))must are ordinary TSs, so the familybased model checking problem can be solved using existing single-system model checkers with all the optimizations that these tools may already implement.

*Example 4.* Consider the properties introduced in Example 2. Using the TS *<sup>α</sup>*join(VendingMachine)may we can verify <sup>Φ</sup><sup>1</sup> <sup>=</sup> <sup>∀</sup> <sup>∀</sup>♦start (Theorem 1, (∀CTL-)). We obtain the counter-example <sup>1</sup> <sup>→</sup> <sup>3</sup> <sup>→</sup> <sup>5</sup> <sup>→</sup> <sup>7</sup> <sup>→</sup> <sup>3</sup> ..., which is genuine for variants satisfying c. Hence, variants from [[c]] violate Φ1. On the other hand, by verifying that *<sup>α</sup>*join(π[[¬c]](VendingMachine))may satisfies <sup>Φ</sup>1, we can conclude by Theorem 1, (∀CTL-) that variants from [[¬c]] satisfy <sup>Φ</sup>1.

We can verify <sup>Φ</sup><sup>2</sup> <sup>=</sup> <sup>∀</sup> <sup>∃</sup>♦start by checking may- and must-components of *α*join(VendingMachine). In particular, we have *α*join(VendingMachine)may <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>2</sup> and *<sup>α</sup>*join(VendingMachine)must <sup>|</sup><sup>=</sup> <sup>Φ</sup>2. Thus, using Theorem 1, (CTL-) and Theorem 2, we have that VendingMachine <sup>|</sup><sup>=</sup> <sup>Φ</sup>2.

Using *<sup>α</sup>*join(VendingMachine)must we can verify <sup>Φ</sup><sup>3</sup> <sup>=</sup> <sup>∃</sup> <sup>∃</sup>♦start, by finding the witness <sup>1</sup> <sup>→</sup> <sup>2</sup> <sup>→</sup> <sup>3</sup> <sup>→</sup> <sup>5</sup> <sup>→</sup> <sup>7</sup> <sup>→</sup> <sup>8</sup> <sup>→</sup> <sup>1</sup> .... By Theorem 1, (∃CTL-), we have that VendingMachine <sup>|</sup><sup>=</sup> <sup>Φ</sup>3. 

## **4 Implementation**

We now describe an implementation of our abstraction-based approach for CTL model checking of variational systems in the context of the state-of-the-art NuSMV model checker [3]. Since it is difficult to use FTSs to directly model very large variational systems, we use a high-level modelling language, called fNuSMV. Then, we show how to implement projection and variability abstractions as syntactic transformations of fNuSMV models.

*A High-Level Modelling Language.* fNuSMV is a feature-oriented extension of the input language of NuSMV, which was introduced by Plath and Ryan [28] and subsequently improved by Classen [4]. A NuSMV model consists of a set of variable declarations and a set of assignments. The variable declarations define the state space and the assignments define the transition relation of the finite state machine described by the given model. For each variable, there are assignments that define its initial value and its value in the next state, which is given as a function of the variable values in the present state. Modules can be used to encapsulate and factor out recurring submodels. Consider a basic NuSMV model shown in Fig. 4a. It consists of a single variable x which is initialized to 0 and does not change its value. The property (marked by the keyword SPEC) is "∀♦(<sup>x</sup> <sup>≥</sup> <sup>k</sup>)", where <sup>k</sup> is a meta-variable that can be replaced with various natural numbers. For this model, the property holds when k = 0. In all other cases (for k > 0), a counterexample is reported where x stays 0.

The fNuSMV language [28] is based on superimposition. *Features* are modelled as self-contained textual units using a new FEATURE construct added to the NuSMV language. A feature describes the changes to be made to the given basic NuSMV model. It can introduce new variables into the system (in a section marked by the keyword INTRODUCE), override the definition of existing variables in the basic model and change the values of those variables when they are read (in a section marked by the keyword CHANGE). For example, Fig. 4b shows a FEATURE construct, called A, which changes the basic model in Fig. 4a. In particular, the feature A defines a new variable nA initialized to 0. The basic system is changed in such a way that when the condition "nA = 0" holds then in the next state the basic system's variable x is incremented by 1 and in this case (when x is incremented) nA is set to 1. Otherwise, the basic system is not changed.

Classen [4] shows that fNuSMV and FTS are expressively equivalent. He [4] also proposes a way of composing fNuSMV features with the basic model to create a single model in pure NuSMV which describes all valid variants. The information about the variability and features in the composed model is recorded in the states. This is a slight deviation from the encoding in FTSs, where this information is part of the transition relation. However, this encoding has the advantage of being implementable in NuSMV without drastic changes. In the composed model each feature becomes a Boolean state variable, which is non-deterministically initialised and whose value never changes. Thus, the initial states of the composed model include all possible feature combinations. Every change performed by a feature is guarded by the corresponding feature variable.

For example, the composition of the basic model and the feature A given in Figs. 4a and b results in the model shown in Fig. 4c. First, a module, called *features* , containing all features (in this case, the single one A) is added to the system. To each feature (e.g. A) corresponds one variable in this module (e.g. fA). The *main* module contains a variable named f of type *features* , so that all feature variables can be referenced in it (e.g. f.fA). In the next state, the variable x is incremented by 1 when the feature A is enabled (fA is *TRUE*) and nA is 0. Otherwise (*TRUE:* can be read as *else:*), x is not changed. Also, nA is set to 1 when <sup>A</sup> is enabled and <sup>x</sup> is incremented by 1. The property <sup>∀</sup>♦(<sup>x</sup> <sup>≥</sup> 0) holds for both variants when A is enabled and A is disabled (fA is *FALSE*).

*Transformations.* We present the syntactic transformations of fNuSMV models defined by projection and variability abstractions. Let M represent a model obtained by composing a basic model with a set of features F. Let M contain a set of assignments of the form: s(v) := case b<sup>1</sup> : e1; ...b<sup>n</sup> : en; esac, where <sup>v</sup> is a variable, <sup>b</sup><sup>i</sup> is a boolean expression, <sup>e</sup><sup>i</sup> is an expression (for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>), and s(v) is one of v, init(v), or next(v). We denote by [[M]] the FTS for this model [4].

Let <sup>K</sup> <sup>⊆</sup> <sup>2</sup><sup>F</sup> be a set of configurations described by a feature expression <sup>ψ</sup> , i.e. [[ψ ]] = K . The projection π[[ψ-]]([[M]]) is obtained by adding the constraint ψ to each b<sup>i</sup> in the assignments to the state variables.

Let (α, γ) be a Galois connection from Sect. 3. The abstract α(M)may and α(M)must are obtained by the following rewrites for assignments in M:

$$\begin{array}{l} \alpha \left( s(v) := \mathsf{case} \, b\_1 : e\_1; \dots \, b\_n : e\_n; \mathsf{ease} \right)^{mary} = s(v) := \mathsf{case} \, \alpha^m(b\_1) : e\_1; \dots \, \alpha^m(b\_n) : e\_n; \mathsf{ease} \\\alpha \left( s(v) := \mathsf{case} \, b\_1 : e\_1; \dots \, b\_n : e\_n; \mathsf{ease} \right)^{must} = s(v) := \mathsf{case} \, \widetilde{\alpha}(b\_1) : e\_1; \dots \, \widetilde{\alpha}(b\_n) : e\_n; \mathsf{ease} \end{array}$$

The functions <sup>α</sup><sup>m</sup> and <sup>α</sup> copy all basic boolean expressions other than feature expressions, and recursively calls itself for all sub-expressions of compound expressions. For *α*join(M)may, we have a single Boolean variable rnd which is non-deterministically initialized. Then, α<sup>m</sup>(ψ) = rnd if α(ψ) = *true*. We have: α([[M]])may = [[α(M)may]] and α([[M]])must = [[α(M)must]]. For example, given the composed model <sup>M</sup> in Fig. 4c, the abstractions *<sup>α</sup>*join(M)may and *<sup>α</sup>*join(M)must are shown in Figs. <sup>5</sup> and 6, respectively. Note that *<sup>α</sup>* join(f.fA) = *false*, so the first branch of case statements in <sup>M</sup> is never taken in *<sup>α</sup>*join(M)must.

```
Fig. 5. αjoin(M)
           may Fig. 6. αjoin(M)
                                        must
```
## **5 Evaluation**

We now evaluate our abstraction-based verification technique. First, we show how variability abstractions can turn a previously infeasible analysis of variability model into a feasible one. Second, we show that instead of verifying CTL properties using the family-based version of NuSMV [7], we can use variability abstraction to obtain an abstract variability model (with a low number of variants) that can be subsequently model checked using the standard version of NuSMV.

All experiments were executed on a 64-bit Intel-CoreTM i7-4600U CPU running at 2.10 GHz with 8 GB memory. The implementation, benchmarks, and all results obtained from our experiments are available from: https://aleksdimovski. github.io/abstract-ctl.html. For each experiment, we report the time needed to perform the verification task in seconds. The BDD model checker NuSMV is run with the parameter -df -dynamic, which ensures that the BDD package reorders the variables during verification in case the BDD size grows beyond a certain threshold.

*Synthetic Example.* As an experiment, we have tested limits of family-based model checking with extended NuSMV and "brute-force" single-system model checking with standard NuSMV (where all variants are verified one by one). We have gradually added variability to the variational model in Fig. 4. This was done by adding optional features which increase the basic model's variable x by the number corresponding to the given feature. For example, the CHANGE section for the second feature B is: IF (nB = 0) THEN IMPOSE next(x) := x + 2; next(nB) := next(x) = x + 2?1:nB, and the domain of x is 0..3.

We check the assertion <sup>∀</sup>♦(<sup>x</sup> <sup>≥</sup> 0). For <sup>|</sup>F<sup>|</sup> = 25 (for which <sup>|</sup>K<sup>|</sup> = 2<sup>25</sup> variants, and the state space is 2<sup>32</sup>) the family-based NuSMV takes around 77 min to verify the assertion, whereas for <sup>|</sup>F<sup>|</sup> = 26 it has not finished the task within two hours. The analysis time to check the assertion using "brute force" with standard NuSMV ascends to almost three years for <sup>|</sup>F<sup>|</sup> = 25. On the other hand, if we apply the variability abstraction *α*join, we are able to verify the same assertion by only one call to standard NuSMV on the *abstracted* model in 2.54 s for <sup>|</sup>F<sup>|</sup> = 25 and in 2.99 s for <sup>|</sup>F<sup>|</sup> = 26.

**Elevator.** The Elevator, designed by Plath and Ryan [28], contains about 300 LOC and 9 independent features: Antiprunk, Empty, Exec, OpenIfIdle, Overload, Park, QuickClose, Shuttle, and TTFull, thus yielding 2<sup>9</sup> = 512 variants. The elevator serves a number of floors (which is five in our case) such that there is a single platform button on each floor which calls the elevator. The elevator will always serve all requests in its current direction before it stops and changes direction. When serving a floor, the elevator door opens and closes again. The size of the Elevator model is 2<sup>28</sup> states. On the other hand, the sizes of *α*join(Elevator)may and *α*join(Elevator)must are 2<sup>20</sup> and 2<sup>19</sup> states, resp.

We consider five properties. The <sup>∀</sup>CTL property "Φ<sup>1</sup> <sup>=</sup> <sup>∀</sup> (floor <sup>=</sup> <sup>2</sup> <sup>∧</sup> lif tBut5.pressed <sup>∧</sup> direction <sup>=</sup> up ⇒ ∀[direction <sup>=</sup> upUfloor = 5]" is that, when the elevator is on the second floor with direction up and the button five is pressed, then the elevator will go up until the fifth floor is reached. This property is violated by variants for which Overload (the elevator will refuse to close its doors when it is overloaded) is satisfied. Given sufficient knowledge of the system and the property, we can tailor


**Fig. 7.** Verification of Elevator properties using tailored abstractions. We compare family-based approach vs. abstraction-based approach.

an abstraction for verifying this property more effectively. We call standard NuSMV to check Φ<sup>1</sup> on two models *α*join(π[[Overload]](Elevator))may and *<sup>α</sup>*join(π[[¬Overload]](Elevator))may. For the first abstracted projection we obtain an "abstract" counter-example violating Φ1, whereas the second abstracted projection satisfies <sup>Φ</sup>1. Similarly, we can verify that the <sup>∀</sup>CTL property "Φ<sup>2</sup> <sup>=</sup> <sup>∀</sup> (floor = 2 <sup>∧</sup> direction <sup>=</sup> up ⇒∀ (direction <sup>=</sup> up))" is satisfied only by variants with enabled Shuttle (the lift will change direction at the first and last floor). We can successfully verify Φ<sup>2</sup> for *α*join(π[[Shuttle]](Elevator))may and obtain a counter-example for *<sup>α</sup>*join(π[[¬Shuttle]](Elevator))may. The <sup>∃</sup>CTL property "Φ<sup>3</sup> = (OpenIfIdle ∧ ¬QuickClose) =⇒ ∃♦(∃ (door <sup>=</sup> open))" is that, there exists an execution such that from some state on the door stays open. We can invoke the standard NuSMV to verify that Φ<sup>3</sup> holds for *<sup>α</sup>*join(π[[OpenIfIdle∧¬QuickClose]](Elevator))must. The following two properties are neither in <sup>∀</sup>CTL nor in <sup>∃</sup>CTL. The property "Φ<sup>4</sup> <sup>=</sup> <sup>∀</sup> (floor = 1∧idle∧door <sup>=</sup> closed <sup>=</sup>⇒ ∃(floor = 1 <sup>∧</sup> door <sup>=</sup> closed))" is that, for any execution globally if the elevator is on the first floor, idle, and its door is closed, then there is a continuation where the elevator stays on the first floor with closed door. The satisfaction of Φ<sup>4</sup> can be established by verifying it against both *α*join(Elevator)may and *α*join(Elevator)must using two calls to standard NuSMV. The property "Φ<sup>5</sup> <sup>=</sup> Park <sup>=</sup>⇒ ∀ (floor = 1 <sup>∧</sup> idle <sup>=</sup>⇒ ∃[idleUfloor = 1])" is satisfied by all variants with enabled Park (when idle, the elevator returns to the first floor). We can successfully verify Φ<sup>5</sup> by analyzing *α*join(π[[Park]](Elevator))may and *α*join(π[[Park]](Elevator))must using two calls to standard NuSMV. We can see in Fig. 7 that abstractions achieve significant speed-ups between 2.5 and 32 times faster than the family-based approach.

#### **6 Related Work and Conclusion**

Recently, many family-based techniques that work on the level of variational systems have been proposed. This includes family-based syntax checking [20,25], family-based type checking [24], family-based static program analysis [16,17,27], family-based verification [22,23,29], etc. In the context of family-based model checking, Classen et al. present FTSs [6] and specifically designed family-based model checking algorithms for verifying FTSs against LTL [5]. This approach is extended [4,7] to enable verification of CTL properties using an family-based version of NuSMV. In order to make this family-based approach more scalable, the works [15,21] propose applying conservative variability abstractions on FTSs for deriving abstract family-based model checking of LTL. An automatic abstraction refinement procedure for family-based model checking is then proposed in [19]. The application of variability abstractions for verifying real-time variational systems is described in [18]. The work [11,13] presents an approach for familybased software model checking of #ifdef-based (second-order) program families using symbolic game semantics models [10].

To conclude, we have proposed conservative (over-approximating) and their dual (under-approximating) variability abstractions to derive abstract familybased model checking that preserves the full CTL-. The evaluation confirms that interesting properties can be efficiently verified in this way. In this work, we assume that a suitable abstraction is manually generated before verification. If we want to make the whole verification procedure automatic, we need to develop an abstraction and refinement framework for CTL properties similar to the one in [19] which is designed for LTL.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **FPH: Efficient Non-commutativity Analysis of Feature-Based Systems**

Marsha Chechik1(B) , Ioanna Stavropoulou<sup>1</sup> , Cynthia Disenfeld<sup>1</sup> , and Julia Rubin<sup>2</sup>

> <sup>1</sup> University of Toronto, Toronto, Canada *{*chechik,ioanna,disenfeld*}*@cs.toronto.edu <sup>2</sup> University of British Columbia, Vancouver, Canada mjulia@ece.ubc.ca

**Abstract.** *Feature-oriented software development (FOSD)* is a promising approach for developing a collection of similar software products from a shared set of software assets. A well-recognized issue in FOSD is the analysis of *feature interactions*: cases where the integration of multiple features would alter the behavior of one or several of them. Existing approaches to feature interaction detection require a fixed order in which the features are to be composed but do not provide guidance as to how to define this order or how to determine a relative order of a newlydeveloped feature w.r.t. existing ones. In this paper, we argue that classic feature non-commutativity analysis, i.e., determining when an order of composition of features affects properties of interest, can be used to complement feature interaction detection to help build orders between features and determine many interactions. To this end, we develop and evaluate Mr. Feature Potato Head (FPH) – a modular approach to noncommutativity analysis that does not rely on temporal properties and applies to systems expressed in Java. Our experiments running FPH on 29 examples show its efficiency and effectiveness.

## **1 Introduction**

*Feature-oriented software development (FOSD)* [3] is a promising approach for developing a collection of similar software products from a shared set of software assets. In this approach, each feature encapsulates a certain unit of functionality of a product; features are developed and tested independently and then integrated with each other; developed features are then combined in a prescribed manner to produce the desired set of products. A well-recognized issue in FOSD is that it is prone to creating *feature interactions* [2,13,22,28]: cases where integrating multiple features alters the behavior of one or several of them. Not all interactions are desirable. E.g., the Night Shift feature of the recent iPhone did not allow the Battery Saver to be enabled (and the interaction was not fixed for over 2 months, potentially affecting millions of iPhone users). More critically, in 2010, Toyota had to recall hundreds of thousands of Prius cars due to an interaction between the regenerative braking system and the hydraulic braking system that caused 62 crashes and 12 injuries.

Existing approaches for identifying feature interactions either require an explicit order in which the features are to be composed [6,8,18,19,26] or assume presence of a "150%" representation which uses an implicit feature order [12,15]. Yet they do not provide guidance on how to define this order, or how to determine a relative order of a newly-developed feature w.r.t. existing ones.

A classical approach of feature non-commutativity detection, defined by Plath and Ryan [25], can be used to help build a composition order. The authors defined non-commutativity as "the presence of a property, the value of which is different depending on the order of the composition of the features" and proposed a model-checking approach allowing to check available properties on different composition orders. E.g., consider the Elevator System [14,25] consisting of five features: *Empty* – to clear the cabin buttons when the elevator is empty; *ExecutiveFloor* – to override the value of the variable stop to give priority to the executive floor (not stopping in the middle); *TwoThirdsFull* – to override the value of stop not allowing people to get into the elevator when it is two-thirds full; *Overloaded* – to disallow closing of the elevator doors while it is overloaded; and *Weight* – to allow the elevator to calculate the weight of the people inside the cabin. Features *TwoThirdsFull* and *ExecutiveFloor* are not commutative (e.g., a property "the elevator does not stop at other floors when there is a call from the executive floor" changes value under different composition orders), whereby *Empty* and *Weight* are. Thus, an order between *Empty* and *Weight* is not required, whereas the user needs to determine which of *TwoThirdsFull* or *ExecutiveFloor* should get priority. Thus, *feature non-commutativity guarantees a feature interaction, whereas feature commutativity means that order of composition does not matter. Both of these outcomes can effectively complement other feature interaction approaches.*

In this paper, we aim to make commutativity analysis practical and applicable to a broad range of modern feature-based systems, so that it can be used as "the first line of defense" before running other feature interaction detections. There are three main issues we need to tackle. First of all, to prove that features commute requires checking their composition against all properties, and capturing the complete behavior of features in the form of formal specifications is an infeasible task. Thus, we aim to make our approach *property-independent*. Second, we need to make commutativity analysis *scalable* and avoid rechecking the entire system every time a single feature is modified or a new one is added. Finally, we need to support analysis of systems expressed in modern programming languages such as Java.

In [25], features execute "atomically" in a state-machine representation of the system, i.e., they make all state changes in one step. However, when systems are represented in conventional programming languages like Java, feature execution may take several steps; furthermore, such features are composed *sequentially*, using *superimposition* [5]. Examining properties defined by researchers studying such systems [6], we note that they do not refer to intermediate states within the feature execution, but only to states before or after running the feature, effectively treating features as atomic. In this paper, we use this notion of atomicity to formalize commutativity. The foundation of our technique is the separation between feature behavior and feature composition and efficiently checking whether different feature compositions orders leave the system in the same internal state. Otherwise, a property distinguishing between the orders can be found, and thus they do not commute. We call the technique and the accompanying tool Mr. Feature Potato Head (*FPH* ), named after the kids' toy which can be composed from interchangeable parts.

In this paper, we show that FPH can perform commutativity analysis in an efficient and precise manner. It performs a modular checking of *pairs of features* [17], which makes the analysis very scalable: when a feature is modified, the analysis can focus only on the interactions related to that feature, without needing to consider the entire family. That is, once the initial analysis is completed, a partial order between the features of the given system can be created and used for detecting other types of interactions. Any feature added in the future will be checked against all other features for non-commutativity-related interactions to define its order among the rest of the features, but the existing order would not be affected. In this paper, we only focus on the non-commutativity analysis and consider interaction *resolution* as being out of scope.

**Contributions.** This paper makes the following contributions: (1) It defines commutativity for features expressed in imperative programming languages and composed via superimposition. (2) It proposes a novel modular representation for features that distinguishes between feature composition and behavior. (3) It defines and implements a modular specification-free feature commutativity analysis that focuses on pairs of features rather than on complete products or product families. (4) It instantiates this analysis on features expressed in Java. (5) It shows that the implemented analysis is effective for detecting instances of non-commutativity as well as proving their absence. (6) It evaluates the efficiency and scalability of the approach.

The rest of the paper is organized as follows. We provide the necessary background, fix the notation and define the notion of commutativity in Sect. 2. In Sect. 3, we describe our iterative tool-supported methodology for detecting feature non-commutativity for systems expressed in Java. We evaluate the effectiveness and scalability of our approach in Sect. 4, compare our approach to related work in Sect. 5 and conclude in Sect. 6<sup>1</sup>.

#### **2 Preliminaries**

In this section, we present the basic concepts and definitions and define the notion of commutativity used throughout this paper.

<sup>1</sup> The complete replication package including the tool binary, case studies used in our experiments and proofs of selected theorems is available at https://github.com/ FeaturePotatoHead/FPH.


**Fig. 1.** Java code snippet of the feature *ExecutiveFloor*.

**Feature-Oriented Software Development (FOSD).** In FOSD, *products* are specified by a set of features (*configuration*). A *base system* has no features. While defining the notion of a feature is an active research topic [11], in this paper we assume that a feature is *"a structure that extends and modifies the structure of a given program in order to satisfy a stakeholder's requirement, to implement a design decision and to offer a configuration option"* [5].

**Superimposition.** *Superimposition* is a feature composition technique that composes software features by merging their corresponding substructures. Based on superimposition, Apel et al. [5] propose a composition technique where different components are represented using a uniform and language independentstructure called a *feature structure tree (FST)*. An *FST* is a tree T = -(Terminal Node) | (Non Terminal Node) (Tree T)+, where + denotes "one or more". A *Non Terminal Node* is a tuple *name*,*type* which represents a non-leaf element of T with the respective name and type. A *Terminal Node* is a tuple *name*,*type*, *body* which represents a leaf element of T. In addition to *name* and *type*, each *Terminal Node* has *body* that encapsulates the content of the element, i.e., the corresponding method implementation or field initializer. A *feature* is a tuple f = *name*, T, where *name* is a string representing f's name and T is an FST abstractly representing f.

Each feature describes the modifications that need to be made to the base system, also represented by an FST, to enable the behavior of the feature. While FSTs are generally language-independent, in this paper we focus on features defined in a Java-based language. For example, consider the Java code snippet in Fig. 1, which shows the *ExecutiveFloor*. This feature makes one of the floors "an executive one". If there is a call to or from this floor, it gets priority over any other call. This feature is written in Java using a special keyword original [5] (line 9). Under this composition, a call from the new method to every existing method with the same name is added, in order to preserve the original behavior. Without original, new methods replace existing ones.

The feature *ExecutiveFloor* in Fig. 1 is represented by the tuple *executive*, T, where T is the FST in Fig. 2. ElevatorSystem is a Non Terminal Node that represents the ElevatorSystem package with the tuple -ElevatorSystem, package, and stopRequestedInDirection is a Terminal Node represented by stopRequestedInDirection, method, *body*, where *body* is the content of the stopRequested-InDirection method in Fig. 1 (lines 8–9). Another Non Terminal Node is Elevator, whereas executiveFloor, isExecutiveFloor, isExecutiveFloorCalling and stopRequestedAtCurrentFloor are Terminal.

**Fig. 2.** FST representation for the feature *ExecutiveFloor*.

**Fig. 3.** Simplified composition of *ExecutiveFloor* and the base elevator system.

For Java-specified features, Terminal Nodes represent methods, fields, import statements, modifier lists, as well as extends, implements and throws clauses whereas directories, files, packages and classes are represented by Non Terminals.

**Superimposition Process.** Given two FSTs, starting from the root and proceeding recursively to create a new FST, two nodes are composed when they share the same name and type and when their parent nodes have been composed. For Terminal Nodes which additionally have a body, if a Node *A* is composed with a Node *B*, the body of *A* is replaced by that of *B* unless the keyword original is present in the body of *B*. In this case, the body of *A* is replaced by that of *B* and the keyword is replaced by *A*'s body. Since the original keyword is not used for fields, the body of the initial field is always replaced by that of the new one.

Figure 3 gives an example of a composition of a simplified *ExecutiveFloor* feature with the elevator base system. Terminal Nodes that have been overridden by the feature are with dashed outline and new fields and methods added by the feature are shown as shaded nodes. For example, the method stopRequested, which is part of the base system, is overridden by the feature, whereas the field executiveFloor, which is only part of the feature, is added to the base system.

**Commutativity.** We define commutativity w.r.t. properties observable before or after features finish their execution (as those in [6]). A *state* of the system after superimposing a feature is the valuation of each variable (or array, object, field, etc. [24]) of the base system and each variable (or array, etc.) introduced by the feature. We also add a new variable *inBase* which is *true* iff this state is not within a method overridden by any feature. In the rest of the paper, we refer to states where *inBase* is *true* as *inBase* states. A *transition* of the system is an execution of a statement, including method calls and return statements [24].

Then we say that two features *commute* if they preserve valuation of properties of the form G(*inBase* =⇒ φ), where φ is a propositional formula defined over any system state variables. That is, they do not commute if there is at least one state of the base system which changes depending on the order in which the features are composed. For example, the property "the elevator does not stop at other floors when there is a call from the executive floor", used in Sect. 1 to identify non-commutativity between features *TwoThirdsFull* and *ExecutiveFloor*, is G(*inBase* =⇒ ¬(*isExecutiveFloorCalling* ∧ *stopped* ∧ *floor*=*executiveFloor* )).

## **3 Methodology**

Our goal is to provide a scalable technique for determining whether features commute by establishing whether the two different composition orders leave the system in the same internal state. The workflow of FPH is shown below. The first step of FPH is to transform each feature from an FST into an FPH representation consisting of a set of fragments. The base is transformed in the same way as the individual features. Each fragment is further

split into feature behavior and feature composition – see Sect. 3.1. Afterwards, we check for non-compositionality. If there do not exist feature fragments that have *shared location* of composition, i.e., whose feature composition components are the same, then the features commute. Otherwise, check the pairs of feature fragments for *behavior preservation*, i.e., when the two features are composed in the same location, the previous behavior is still present and can be executed. If this check succeeds, we perform the *shared variables* check – see Sect. 3.2.

## **3.1 Separating Feature Behavior and Composition**

We now formally define the FPH representation of features that separates the behavior of features and location of their composition and provide transformation operators between the FPH and the FST representations.

**Definition 1.** *An FPH feature is a tuple* name, fragments*, where* name *is the feature name and* fragments *is the list of feature fragments that comprise the feature. Let a feature* f *be given. A* Feature Fragment fg *is a tuple* fb*,* fc*, where* fb *is a feature behavior defined in Definition* 2 *and* fc *is a feature composition defined in Definition* 3*.*

**Definition 2.** Feature Behavior fb *of a feature fragment* fg *is a tuple* name,type,body, bp, vars*, where* name*,* type *and* body *represent the name, type and content, respectively, of the element represented by* fg*.* bp *is a boolean value which is set to* true *if the feature preserves the original behavior, i.e., when the keyword* original *is present in the body and not within a conditional statement.* vars *is a list of variable names read or written within* fg*.*

**Definition 3.** Feature composition fc *of a feature fragment* fg *is represented by* location *which is the path leading to the terminal node represented by* fg*.*

The *Separate* operator (see Fig. 4a) transforms features from the FST to the FPH representation by creating a new fragment for each Terminal Node in the given FST. For the behavior component of the fragment, its *name*, *type* and *body* attributes come from the respective counterparts of the FST Terminal Node. The *bp* field is *true* if every path within *body* contains the keyword original; otherwise, it is *false*. For the composition component, the *location* field gets its value from the unique path to the Terminal Node from the root of the FST. *vars* are the parameters of the method and the fields that are used within it.

E.g., consider creating the FPH representation for *ExecutiveFloor* feature in Fig. 2. Since there are five Terminal Nodes, five fragments will be created to represent each node. In the fragment created for the stopRequestedInDirection node, the information in *fb* about *name*, *node* and *type* is derived from the information stored in the node, *fb* = stopRequestedInDirection, method, [body], where body consists of lines 8–9 of Fig. 1. *bp* is *false* since the keyword original is within an if statement and *vars* consists only of the method parameters since the method does not use any global fields. After separating, the feature composition is fc = ElevatorSystem.Elevator.stopRequested-InDirection.

To transform features from FPH back to FST, we define the *Join* operator. It takes as input a list of feature fragments and returns an FST (see Fig. 4b).

```
(a) (b)
 (c)
```
**Fig. 4.** Algorithms *Separate*, *Join* and *CheckCommutativity*.

It creates a new Terminal Node to be added to the FST for each feature fragment in the given feature. The *name*, *type* and *body* attributes of the node are filled using the corresponding fields in the feature behavior component of the fragment. Then, starting from the root node, for every node in the *location* path of the feature composition component, if the node does not exist in the FST, it is added; otherwise, the next node of the path is examined. The information about *bp* and *vars* is already contained in the body of the Terminal Node and is no longer considered as a separate field. E.g., joining the *ExecutiveFloor* feature that we previously separated yields the FST in Fig. 2, as expected.

**Theorem 1.** *Let* n *be the number of features in a system. For every feature* F *which can be represented as (*f b*,* fc*),* Join *and* Separate *are inverses of each other, i.e.,* Join*(* Separate*(*F*)) =* F *and* Separate*(* Join*(*f b*,*fc*)) = (*f b*,*fc*).*

## **3.2 Compositional Analysis of Non-commutativity**

We now formally present the algorithm *check commutativity*, a sequence of increasingly more precise, and more expensive, static checks to perform noncommutativity analysis. These are called *shared location*, *behavior preservation* and *shared variables* – see Fig. 4c. Additionally, we prove soundness and correctness of the FPH methodology, i.e., that our checks guarantee feature commutativity as defined in Sect. 2.

**Check Shared Location.** The first check examines whether F<sup>1</sup> and F<sup>2</sup> have any fragments that can be composed in the same location (line 3). Clearly, when F<sup>1</sup> and F<sup>2</sup> are applied in different places, e.g., they change different methods, *inBase* states are the same independently of their order of composition, and thus the features commute. Otherwise, more precise checks are required. E.g., *ExecutiveFloor* (see Fig. 2) and *Empty* (see Fig. 5a) do not share methods or fields and thus can be applied in either order.

**Theorem 2.** *If features* F<sup>1</sup> *and* F<sup>2</sup> *are not activated in the same location, any* inBase *state resulting from first composing* F<sup>1</sup> *followed by* F<sup>2</sup> *(denoted* F1; F2*) is the same as for* F2; F1*.*

**Check Behavior Preservation.** Suppose one pair of feature fragments of F<sup>1</sup> and F2, say, f<sup>1</sup> and f2, can be composed in the same location. Then we examine whether the original behavior is preserved or overridden (indicated by the f b

**Fig. 5.** Two features of the elevator system.

field of each fragment). If bp of f<sup>1</sup> and f<sup>2</sup> is *true*, an additional check for shared variables is applied. Otherwise, i.e., when bp of either f<sup>1</sup> or f<sup>2</sup> is *false*, we report an interaction. Clearly, this check can introduce false positives because we do not look at the content of the methods but merely at the presence of the original keyword. E.g., two methods may happen to perform the exact same operation and yet not include the original keyword. In this case, we would falsely detect an interaction2.

**Check Shared Variables.** If F<sup>1</sup> and F<sup>2</sup> are activated at different places and both preserve the original behavior, commutativity of their composition depends on whether they have shared variables that can be both read and written. This check aims to detect that. E.g., both features *Empty* (see Fig. 5a) and *Weight* (see Fig. 5b) modify the leaveElevator method and preserve the original behavior. Since no variables between them are shared, the order of composition does not affect the execution of the resulting system.

Extracting shared variable information requires not only identifying which variable is part of each feature behavior, but also running points-to analysis since aliasing is very common in Java. Moreover, a shared variable might not appear in the body of the affected method but instead in the body of a method called by it. Yet existing frameworks for implementing interprocedural points-to analyses [21] may not correctly identify all variables read and written within a method. Moreover, even if two features do write to the same location, this may not manifest a feature interaction. E.g., they may write the same value. For these reasons, our shared variables check may introduce false positives and false negatives. We evaluate its precision in Sect. 4.

**Theorem 3.** *Let features* F<sup>1</sup> *and* F<sup>2</sup> *activated at the same place and preserving the behavior of the base be given. If the variables read and written by each feature are correctly identified and independent of each other (*F1*.vars* ∩ F2*.vars =* ∅*), then any* inBase *state resulting from composing* F1; F<sup>2</sup> *is the same as that of composing* F2; F1*.*

When two features merely read the same variable, it does not present an interaction problem. We handle this case in our implementation (see Sect. 4).

**Theorem 4 (Soundness).** *Given features* F<sup>1</sup> *and* F2*, if variables read and written by them are correctly identified, Algorithm in Fig.* 4*c is sound: when it outputs Success,* F<sup>1</sup> *and* F<sup>2</sup> *commute.*

**Complexity.** Let |F| be the number of features in the system and let M be the largest number of fragments that each feature can have. For a pair of feature fragments, checking shared location and checking behavior preservation are both done in constant time, so the overall complexity of these steps is <sup>O</sup>((|F| ×M)<sup>2</sup>). In the worst case, all features affect the same set of methods and thus the shared variables check should be run on all of them. Yet, all fragments in a feature are non-overlapping, and thus the number of these checks is at most |F| <sup>2</sup> <sup>×</sup> <sup>M</sup>.

<sup>2</sup> But this does not happen often – see Sect. 4.

The time to perform a shared variable check, which we denote by SV , can vary depending on an implementation and can be as expensive as PSPACE-hard. Thus, the overall complexity of non-commutativity detection is <sup>O</sup>((|F| × <sup>M</sup>)<sup>2</sup> <sup>+</sup> SV × |F| <sup>2</sup> <sup>×</sup> <sup>M</sup>).

## **4 Evaluation**

In this section, we present an experimental evaluation of FPH, aiming to answer the following research questions: **(RQ1)** How effective is FPH in performing non-commutativity analysis of feature-based systems? **(RQ2)** How accurate is FPH's non-commutativity analysis? **(RQ3)** How efficient is FPH compared to state-of-the-art tools for performing non-commutativity analysis? **(RQ4)** How well does FPH scale as the number of fragments increases?

**Tool Support.** We have implemented our methodology (Sect. 3) as follows. The *Separate* process is implemented on top of FeatureHouse's composition operator in Java. We use the parsing process that was provided in FeatureHouse [4] to separate features to the FPH representation and added about 200 LOC.

The main process to check commutativity is implemented as a Python script in about 250 LOC. The first two parts of the commutativity check are directly implemented in the script. The third one, *Check shared variables*, requires considering possible aliases of feature-based Java programs. For this check, we have implemented a Java program, FPH varsAnalysis, that calls Soot [21] to build the call graph and analyze each reachable method. FPH varsAnalysis is an interprocedural context insensitive points-to analysis that, given two feature fragments that superimpose the same method, checks whether a variable of the same type is written by at least one of them and read or written by the other. Since feature fragments cannot be compiled by themselves (and thus Soot cannot be used on them), in order to do alias analysis, our program requires a representation that consists of the base system and all possible features. This representation is readily available for systems from the SPLVerifier repository since it uses a family-based approach to analysis. We generate a similar representation for all other systems used in our experiments.

**Models and Methods.** We have applied FPH to 29 case studies written in Java. In the first five columns of Table 1, we summarize the information about these systems. The first six have been considered by SPLVerifier [6] – a tool for checking whether a software product line (SPL) satisfies its feature specifications. SPLVerifier includes sample-based, product-based and family-based analyses and assumes that the order in which features should be composed is provided. The SPLVerifier examples came with specifications given by aspects woven at base system points, with an exception thrown if the state violates an expected property. The rest of our case studies are SPLs from the FeatureHouse repository [4].

We were unable to identify other techniques for analyzing feature commutativity of Java programs. Plath and Ryan [25] and Atlee et al. [8] compare different composition orders but handle only state machines. SPLVerifier [6] represents state of the art in verification of feature-based systems expressed in Java, but it is not designed to do non-commutativity analysis. In the absence of alternative tools, we adapted SPLVerifier to the task of finding non-commutativity violations to be able to compare with FPH.

We conducted two experiments to evaluate FPH and to answer our research questions. For the first, we ran SPLVerifier on the first six systems (all properties that came with them satisfied the pattern in Sect. 2 and thus were appropriate for commutativity detection) presented in Table 1 to identify non-commutativity interactions. Since SPLVerifier is designed to check products against a set of specifications, we have to define what a commutativity check means in this context. For a pair of features, SPLVerifier would detect a commutativity violation if, upon composing these features in different orders, the provided property produces different values. During this check, SPLVerifier considers composition of all other features of the system in all possible orders and thus can identify two-way, three-way, etc. feature interactions, if applicable. We measured the time taken by SPLVerifier and the number of interactions found.

For the second experiment, we checked all 29 systems using FPH to identify non-commutativity interactions. We measured the number of feature pairs that required checking for shared variables, the time the analysis took and the precision of FPH in finding interactions. We were unable to establish ground truth for non-commutativity analysis in cases where FPH required the shared variables check due to our tool's reliance on Soot's unsound call graph construction [7]. Thus, we measured precision of our analysis by manually analyzing the validity of every interaction found by FPH. We also calculated SPLVerifier's *relative* recall, i.e., the ratio of non-commutativity-related interactions detected by FPH that were also detected by SPLVerifier. We did not encounter any interactions that were detected by SPLVerifier but not by FPH.

When the shared variables check is not necessary, our technique is sound. In such cases, if we inform the user that two features are commutative, they certainly are, and there is no need to define an order between them. As shown below, soundness was affected only for a small number of feature pairs. Moreover, advances in static analysis techniques may improve our results for those cases in the future. Our experiments were performed on a 2 GB RAM Virtual machine within an Intel Core i5 machine dual-core at 1.3 GHz.

**Results.** Columns 6–10 of Table 1 summarize results of our experiments, including, for the first six examples, SPLVerifier's precision and (relative) recall. "SV pairs" capture the number of feature pairs for which the shared variables check was required. A dash in the precision columns means that the measurement was not meaningful since no interactions were detected. E.g., SPLVerifier does not detect any non-commutativity interactions for Email, and FPH does not find any non-commutativity interactions for EPL. FPH found a number of instances of non-commutativity such as the one between *ExecutiveFloor* and *TwoThirds-Full* in the Elevator System. Only one SV check was required (while checking *Empty* and *Weight* features). Without our technique, the user would need to


**Table 1.** Overview of case studies.

provide order between the five features of the Elevator System, that is, specify 20 (5 × 4) ordering constraints. FPH allows us to conclude that *ExecutiveFloor* and *TwoThirdsFull* do not commute, that *Empty* and *Weight* likely commute but this is not guaranteed, and that all other pairs of features do commute. Thus, only two feature pairs required further analysis by the user.

The Minepump system did not require the shared variable check at all and thus FPH analysis for it is sound, and all three of the found interactions were manually confirmed to be "real" (thus, precision is 1). ChatSystem/Weiss has nine features which would imply needing to define the order between 72 (9 × 8) feature pairs. Four non-commutativity cases were found, all using the shared variables check, but only three were confirmed as "real" via a manual inspection (thus, precision is 0.75). We conclude that FPH is effective in discovering noncommutativity violations and proving their absence (**RQ1**).

We now turn to studying the accuracy of FPH w.r.t. finding noncommutativity violations (**RQ2**). From Table 1, we observe that for the Elevator System, both FPH and SPLVerifier correctly detect a non-commutativity interaction. For the Minepump system, SPLVerifier only finds two out of the three interactions found by FPH (relative recall = 0.67). For the Email system, AJStats, ZipMe, and GPL the specifications available in SPLVerifier do not allow detecting any of the non-commutativity interactions found by FPH (relative recall = 0).

GPL was a problematic case for FPH, affecting its precision. The graph algorithms in this example take a set of vertices and create and maintain an internal

**Fig. 6.** (a) Number of FPH varsAnalysis calls per system; (b) Time spent by FPH varsAnalysis per system; (c) Percentage of non-commutativity checks where BP or SV analyses were applied last. (Color figure online)

data structure (e.g., to calculate the vertices involved in the shortest path or in a strongly connected component). With this data structure, our analysis found a number of possible shared variables and incorrectly deemed several features as non-commutative. E.g., the algorithms to find cycles or the shortest path between two nodes access the same set of vertices but change different fields and thus are commutative. One way of avoiding such false positives would be to implement field-sensitive alias analysis. While more precise, it will be significantly slower than our current shared variables analysis.

For the remaining systems, either FPH's reported interactions were "real", or, in cases where it returned some false positives (ChatSystemBurke, ChatSystemWeiss, and TankWar), it had to do with the precision of the alias analysis. Thus, given SPLVerifier's set of properties, FPH always exhibited the same or better precision and recall than SPLVerifier. Moreover, for all but three of the remaining systems, FPH exhibited perfect precision. We thus conclude that FPH is very accurate (**RQ2**).

We now turn to the efficiency of our analysis (**RQ3**). The time it took to separate features into behavior and composition was usually under 5 s. The outlier was BerkeleyDB, which took about a minute, due to the number of features and especially fragments (BerkeleyDB has 2667 fragments whereas Violet has 912 and the other systems have at most 229). In general, the time taken by FPH's commutativity check was highly influenced by the number of calls to FPH varsAnalysis. Figure 6a shows the number of calls to FPH varsAnalysis as the number of features increases. E.g., BerkeleyDB has 98 features and required only one call to FPH varsAnalysis, while AJStats has 19 features and required 136 of these calls. More features does not necessarily imply needing more of these checks. E.g., Violet and BerkeleyDB required fewer checks than AJStats, TankWar, and GPL, and yet they have more features.

Figure 6b shows the overall time spent by FPH varAnalysis per system being analyzed. NotepadQuark and Violet took more time (resp., 1192 sec. and 1270 sec.) than GPL (1084 sec.) since these systems have calls to Java GUI libraries (awt and swing), thus resulting in a larger call graph than for GPL. A similar situation occurred during checking TankWar (1790 sec.) and AJStats (1418 sec.). It took FPH under 200 s in most cases and less than 35 min in the worst case to analyze non-commutativity (see Fig. 6b). FPH was efficient because FPH varAnalysis was required for a relatively small fraction of pairs of feature fragments. We plot this information in Fig. 6c. For each analyzed system, it shows the percentage of feature fragments for which *behavior preservation* (BP) or *shared variables* (SV) was the last check conducted by FPH (out of the possible 100%). We omit the systems for which these checks were required for less than 1% of feature pairs. The figure shows that the calls to FPH varsAnalysis (to compute SV, in blue) were not required for over 96% of feature pairs.

To check for non-commutativity violations, SPLVerifier needs to check all possible products which is infeasible in practice. So we set the timeout to one hour during which SPLVerifier was able to check 110 products for Elevator, 57 for Email, 151 for Minepump, 3542 for GPL, 2278 for AJStats and 1269 for ZipMe. For each of these systems, a different check is required for every specification, thus the same product is checked more than once if more than one specification exists. Even though GPL, AJStats and ZipMe are larger systems with more features, they have fewer properties associated with them and therefore we were able to check more products within one hour. Thus, to answer **RQ3**, FPH was much more efficient than SPLVerifier in performing non-commutativity analysis. SPLVerifier was only able to analyze products containing the base system and at most three features before reaching a timeout. Moreover, FPH can *guarantee commutativity*, while SPLVerifier cannot because of it being based on the properties given.

Our experiments also allow us to conclude that our technique is highly scalable (**RQ4**). E.g., the percentage of calls to FPH varsAnalysis is shown to be small and increases only slightly with increase in the number of fragments (see Fig. 6a and b).

**Threats to Validity.** Our results may not generalize to other feature-based systems expressed in Java. We believe we have mitigated this threat by running our tool on examples provided by FeatureHouse. They include a variety of systems of different sizes which we consider to be representative of typical Java featurebased systems. As mentioned earlier, our use of SPLVerifier was not as intended by its designers. We also had no ground truth when the shared variable check was required. For those few cases, we calculated SPLVerifier's relative instead of actual recall.

## **5 Related Work**

In this section, we survey related work on modular feature definitions, feature interaction detection and commutativity-related feature interactions.

**Modular Feature Definitions.** A number of approaches to modular feature definitions have been proposed. E.g., the composition language in [8] includes states in which the feature is to be composed (similar to our *fg.location*) and the feature behavior (similar to our *fb.body*). Other work [4,9,10] uses superimposition of FSTs to obtain the composed system. In [14,25], new variables are added or existing ones are changed with particular kind of compositions (either executing a new behavior when a particular variable is read, or adding a check before a particular variable is set). These approaches treat the feature behavior together with its composition specification. Instead, our approach automatically separates feature definition into the behavioral and the composition part, enabling a more scalable and efficient analysis.

**Feature Interaction Detection.** Calder et al. [13] survey approaches for analyzing feature interactions. Interactions occur because the behavior of one feature is being affected by others, e.g., by adding non-deterministic choices that result in conflicting states, by adding infinite loops that affect termination, or by affecting some assertions that are satisfied by the feature on its own. Checking these properties as well as those discussed in more recent work [8,15,18,19] requires building the entire SPL. Additionally, all these approaches consider state machine representations which are not available for most SPLs, and extracting them from code is non-trivial. SPLLift [12] is a family-based static analysis tool not directly intended to find interactions. Any change in a feature would require building the family-based representation again, whereas we conduct modular checks between features. Spek [26] is a product-based approach that analyzes whether the different products satisfy provided feature specifications. It does not check whether the features commute.

**Non-commutativity-Related Feature Interactions.** [5,8] also looked at detecting non-commutativity-related feature interactions. [5] presents a feature algebra and shows why composition (by superimposition) is, in general, not commutative. [8] analyzes feature commutativity by checking for bisimulation, and the result of the composition is a state machine representing the product. Neither work reports on a tool or applies to systems expressed in Java.

**Aspect-Oriented Approaches.** Storzer et al. [27] present a tool prototype for detecting precedence-related interactions in AspectJ. Technically, this approach is very similar to ours: it (a) detects which advice is activated at the same place; (b) checks whether the proceed keyword and exceptions are present; and (c) analyzes read and written variables. Yet, the focus is on aspects, and often many aspects are required to implement a single feature [23]. This implies that for m features with an average of n aspects each, the analysis in [27] needs to make O - (m · n) 2 checks, while our approach requires O m<sup>2</sup> checks. Therefore, the approach in [27] might be significantly slower than FPH. [1] analyzes interactions of aspects given by composition filters by checking for simulation among all the different orderings in which advice with shared joinpoints can be composed. As the number of advice with shared joinpoints increases, that approach considers every possible ordering, while we keep the analysis pairwise. [16,20] define modular techniques to check properties of aspect-oriented systems. [16] uses assume-guarantee reasoning to verify and detect interactions even when aspects can be activated within other aspects. It does not require an order but does require specifications to detect whether a certain composition order would not satisfy these. [20] uses the explicit CTL model-checking algorithm to distribute global properties into local properties to be checked for each aspect. This yields a modular check. In addition to requiring specifications, this technique assumes AspectJ's ordering of aspects.

## **6 Conclusion and Future Work**

In this paper, we presented a compositional approach for checking noncommutativity of features in systems expressed in Java. The method is based on determining whether pairs of features can write to the same variables and thus the order in which features are composed to the base system may determine their valuation. The method is complementary to other feature interaction detection approaches such as [6,12] in that it helps build an order in which features are to be composed. When two features commute, they can be composed in any order. In addition, this method helps detect a number of feature interactions. The method is implemented in our framework FPH – Mr. Feature Potato Head. FPH does not require specifying properties of features and does not need to consider the entire set of software products every time a feature is modified. By performing an extensive empirical evaluation of FPH, we show that the approach is highly scalable and effective. In the future, we plan to further evaluate our technique, handle languages outside of Java and experiment with more precise methods for determining shared variables.

**Acknowledgements.** We thank anonymous reviewers for their helpful comments. This research has been supported by NSERC.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Taming Multi-Variability of Software Product Line Transformations**

Daniel Str¨uber1(B) , Sven Peldzsus<sup>1</sup> , and Jan J¨urjens1,2

<sup>1</sup> Universit¨at Koblenz-Landau, Koblenz, Germany

{strueber,peldszus,juerjens}@uni-koblenz.de <sup>2</sup> Fraunhofer Institute for Software and Systems Engineering, Dortmund, Germany

**Abstract.** Software product lines continuously undergo model transformations, such as refactorings, refinements, and translations. In product line transformations, the dedicated management of variability can help to control complexity and to benefit maintenance and performance. However, since no existing approach is geared for situations in which both the product line *and* the transformation specification are affected by variability, substantial maintenance and performance obstacles remain. In this paper, we introduce a methodology that addresses such *multivariability* situations. We propose to manage variability in product lines and rule-based transformations consistently by using annotative variability mechanisms. We present a staged rule application technique for applying a variability-intensive transformation to a product line. This technique enables considerable performance benefits, as it avoids enumerating products or rules upfront. We prove the correctness of our technique and show its ability to improve performance in a software engineering scenario.

## **1 Introduction**

Software product line engineering [1] enables systematic reuse of software artifacts through the explicit management of variability. Representing a *software product line* (SPL) in terms of functionality increments called *features*, and mapping these features to development artifacts such as domain models and code allows to generate custom-tailored products on demand, by retrieving the corresponding artifacts for a given feature selection. Companies such as Bosch, Boeing, and Philips use SPLs to deliver tailor-made products to their customers [2].

Despite these benefits, a growing amount of variability leads to combinatorial explosions of the product space and, consequently, to severe challenges. Notably, this applies to software engineering tasks such as refactorings [3], refinements [4], and evolution steps [5], which, to support systematic management, are often expressed as model transformations. When applying a given model transformation to a SPL, a key challenge is to avoid enumerating and considering all possible products individually. To this end, Salay et al. [6] have proposed an algorithm that "*lifts*" regular transformation rules to a whole product line. The algorithm transforms the SPL, represented as a variability-annotated domain model, in such way as if each product had been considered individually.

Yet, in complex transformation scenarios as increasingly found in practice [7], not only the considered models include variations: The transformation system can contain variability as well, for example, due to desired optional behavior of rules, or for rule variants arising from the sheer complexity of the involved meta-models. While a number of works [8–10] support systematic reuse to improve maintainability, *variability-based model transformation* (VB) [11,12] also aims to improve the performance when a transformation system with many similar rules is executed. To this end, these rules are represented as a single rule with variability annotations, called *VB rule*. During rule applications, a special *VB rule application* technique [13] saves redundant effort by considering common rule parts only once. In summary, for cases where either the model or the transformation system alone contains variability, solid approaches are available.

However, a more challenging case occurs when a variability-intensive transformation is applied to an SPL. In this *multi-variability* setting, where *both* the input model and the specification of a transformation contain variability, the existing approaches fall short to deal with the resulting complexity: One can either consider all rules, so they can be "lifted" to the product line, or consider all products, so they become amenable to VB model transformation. Both approaches are undesirable, as they require enumerating an exponentially growing number of artifacts and, therefore, threaten the feasibility of the transformation.

In this paper, we introduce a methodology for SPL transformations inspired by the *uniformity principle* [14], a tenet that suggests to handle variability consistently throughout all software artifacts. We propose to capture variability of SPLs and transformations using variability-annotated domain models and rules. Model and rule elements are annotated with *presence conditions*, specifying the conditions under which the annotated elements are present. The presence conditions of model and rule elements are specified over two separate sets of features, representing SPL and rule variability. Annotated domain models and rules can be created manually using available editor support [15,16], or automatically from existing products and rules by using merge-refactoring techniques [17,18].

Given an SPL and a VB rule, as shown in Fig. 1, we provide a *staged* rule application technique (black arrow) for applying a VB rule to a SPL. In contrast to the state of the art (shown in gray), enumerating products or rules upfront is not required. By adopting this technique, existing tools that use transformation technology, such as refactoring engines, may benefit from improved performance.

Specifically, we make the following contributions:


**Fig. 1.** Overview

– We evaluate the usefulness of our technique by studying its performance in a substantial number of cases within a software engineering scenario.

Our work builds on the underlying framework of algebraic graph transformation (AGT) [19]. AGT is one of the standard model transformation language paradigms [20]; in addition, it has recently gained momentum as an analysis paradigm for other widespread paradigms and languages such as ATL [21]. We focus on the annotative paradigm to variability. Suitable converters to and from alternative paradigms, such as the composition-based one [22], may allow our technique to be used in other cases as well.

The rest of this paper is structured as follows: We motivate and explain our contribution using a running example in Sect. 2. Section 3 revisits the necessary background. Section 4 introduces the formalization of our new rule application technique. The algorithm and its evaluation are presented in Sects. 5 and 6, respectively. In Sect. 7 we discuss related work, before we conclude in Sect. 8.

## **2 Running Example**

In this section, we introduce SPLs and variability-based model transformation by example, and motivate and explain our contribution in the light of this example.

**Software Product Lines.** An SPL represents a collection of models that are similar, but different to each other. Figure 2 shows a washing machine controller SPL in an annotative representation, comprising an annotated domain model and a feature model. The feature model [23] specifies a root feature *Wash* with three optional children *Heat*, *Delay*, and *Dry*, where *Heat* and *Delay* are mutually exclusive. The domain model is a statechart diagram specifying the behavior of the controller SPL based on states *Locking*, *Waiting*, *Washing*, *Drying*, and *UnLocking* with transitions between them. Presence conditions, shown in gray labels, denote the condition under which an annotated element is present. These conditions are used to specify variations in the execution behavior.

Concrete products can be obtained from *configurations*, in which each optional feature is set to either *true* or *false*. A product arises by removing

**Fig. 2.** Washing machine controller product line and product (adapted from [6]).

those elements whose presence condition evaluates to false in the given configuration. For instance, selecting *Delay* and deselecting *Heat* and *Dry* yields the product shown in the right of Fig. 2. The SPL has six configurations and products in total, since *Wash* is non-optional and *Delay* excludes *Heat*.

**Variability-Based (VB) Model Transformation.** In complex model transformation scenarios, developers often create rules that are similar, but different to each other. As an example, consider two rules *foldEntryActions* and *foldExitActions* (Fig. 3), called A and B in short. These rules express a "fold" refactoring for statechart diagrams: if a state has two incoming or outgoing transitions with the same action, these actions are to be replaced by an entry or exit action of the state. The rules have a left- and a right-hand side (LHS, RHS). The LHS specifies a pattern to be matched to an input graph, and the difference between the LHS and the RHS specifies a change to be performed for each match, like the removing of transition actions, and the adding of exit and entry actions.

Rules A and B are simple; however, in a realistic transformation system, the number of required rules can grow exponentially with the number of variation points in the rules. To avoid combinatorial explosion, a set of variability-intensive rules can be encoded into a single representation using a *VB rule* [12,18]. A VB rule consist of a LHS, a RHS, a *feature model* specifying a set of interrelated features, and *presence conditions* annotating LHS and RHS elements with a condition under which they are present. Individual "flat" rules are obtained via configuration, i.e., binding each feature to either *true* or *false*. In the VB rule A + B, the feature model specifies a root feature *refactor* with alternative child features *foldEntry* and *foldExit*. Since exactly one child feature has to be active at one time, two possible configurations exist. The two rules arising from these configurations are isomorphic to rules A and B.

**Problem Statement.** Model transformations such as *foldActions* are usually designed for applications to a concrete software product, represented by a single

**Fig. 3.** Two rules and their encoding into a variability-based rule (adapted from [24]).

model. However, in various situations, it is desirable to extend the usage context to a *set* of models collected in an SPL. For example, during the batch refactoring of an SPL, all products should be refactored in a uniform way.

Variability is challenging for model transformation technologies. As illustrated in Table 1, products and rules need to be considered in manifold combinations. In our example, without dedicated variability support, the user needs to specify 6 products and 2 rules individually and trigger a rule application for each of the 12 combinations. A better strategy is enabled by VB model transformation: by applying the VB rule A+B, only 6 combinations need to be considered. Another strategy is to apply rules A and B to the SPL by *lifting* [6] them, leading to 2 combinations and the biggest improvement so far. Still, in more complex cases, all of these strategies are insufficient. Since none of them avoids an exponential growth along the number of optional SPL features (#F*<sup>P</sup>* ) or optional rule features (#F*r*), the feasibility of the transformation is threatened.

**Table 1.** Approaches for dealing with multi-variability.


**Fig. 4.** Staged rule application of a VB rule to a product line.

**Solution Overview.** To address this situation, we propose a *staged* rule application technique for applying a VB rule to an SPL. As shown in Fig. 4, this technique proceeds in three steps: In step 1, we consider the base rule, that is, the common portion of rules encoded in the VB rule, and match its LHS to the full domain model, temporarily ignoring its presence conditions. For example, considering rule A + B, the LHS of the base rule contains precisely states x1, x2, and x. A match to the domain model is indicated by dashed arrows. Using the presence conditions, we determine if the match can be mapped to any specific product. In step 2, we extend the identified base matches to identify full matches of the rules encoded in the VB rule. In the example, we would derive rules A and B; in general, to avoid fully flattening all involved rules, one can incrementally consider common subrules. An example match is denoted in terms of dashed lines for the mappings of transitions and actions. In step 3, to perform rule applications based on identified matches, we use *lifting* to apply the rule for which the match was found. Lifting transforms the domain model and its presence condition in such way as if each product was considered individually. In the example, only products for the configuration {*Delay = true; Heat = false*} are amenable to the *foldAction* refactoring. Consequently, the new entry action *startWash* has the presence condition *Delay*, and other presence conditions are adjusted accordingly. Failure to find suitable matches and to fulfill a certain condition during lifting (discussed later) allows early termination of the process.

Performance-wise, the main benefit of this technique is twofold: First, using the termination criteria, we can exit the matching process early without considering specifics of products and rule variants. This is particularly beneficial in situations where none or only few rules of a larger rule set are applicable most of the time, which is typically the case, for example, in translators. Second, even if we have to enumerate some rules in step 2, we do not have to start the matching process from scratch, since we can save redundant effort by extending the available base matches. Consequently, Table 1 gives the number of independent combinations (in the sense that rule applications are started from scratch) as 1.

### **3 Background**

We now introduce the necessary prerequisites of our methodology, starting with the double-pushout approach to algebraic graph transformation [19]. As the underlying structure, we assume the category of graphs with graph morphisms (referred to as *morphisms* from here), although all considerations are likely compatible with additional graph features such as typing and attributes.

**Definition 1 (Rules and applications).** *A rule* r = (L *le* ←− I *ri* −→ R) *consists of graphs* L*,* I *and* R*, called* left-hand side*,* interface graph *and* right-hand side*, respectively, and two injective morphisms* le *and* ri*.*

*Given a rule* r*, a graph* G*, and a morphism* <sup>m</sup> : <sup>L</sup> <sup>→</sup> <sup>G</sup>*, a* rule application *from* <sup>G</sup> *to a graph* <sup>H</sup>*, written* <sup>G</sup> <sup>⇒</sup>*r,m* <sup>H</sup>*, arises from the diagram to the right, where (1) and (2) are pushouts.* G*,* m *and* H *are called* start graph*,* match*, and* result graph*, respectively.*

A rule application exists iff the match m fulfills the *gluing condition*, which, in the category of graphs boils down to the *dangling condition*: all adjacent edges of a deleted node in m's image m[L] must have a preimage in L.

**Product Lines.** Our formalization represents product lines on the semantic level by considering interrelations between the included graphs. The domain model is a "maximal" graph of which all products are sub-graphs. The presencecondition function maps sub-graphs (rather than elements, as done on the syntactic level) to terms in the boolean term algebra over features, written T*BOOL*(F*<sup>P</sup>* ). The set of all sub-graphs of the domain model is written <sup>P</sup>(M*<sup>P</sup>* ).

#### **Definition 2 (Product line, configuration, product)**


**Definition 3 (Lifted rule application).** *Given a product line* P*, a rule* r*, and a match* <sup>m</sup> : <sup>L</sup> <sup>→</sup> <sup>M</sup>*<sup>P</sup> , a* lifted rule application <sup>P</sup> <sup>⇒</sup><sup>↑</sup> *r,m* Q *is a construction* *that relates* P *to a product line* Q *s.t.* F*<sup>P</sup> =* F*Q,* Φ*<sup>P</sup> =* Φ*Q, and the set of products* Flat(Q) *is the same as if* <sup>r</sup> *was applied to each product* <sup>P</sup>*<sup>i</sup>* <sup>∈</sup> Flat(P) *for which an inclusion* <sup>j</sup> : <sup>m</sup>[L] <sup>→</sup> <sup>P</sup>*<sup>i</sup> from the image of* <sup>m</sup> *exists.*

Salay et al. [6] provide an algorithm for which it is shown that the properties required in Definition 3 apply. The algorithm extends a rule application to the domain model by a check that the match can be mapped to at least one product, and by dedicated presence condition handling during additions and deletions. A more declarative treatment is offered by Taentzer et al. [25]'s product line pushout construction, which is designed to support lifted rule application as a special case.

**Variability-Based Transformation.** VB rules are defined similarly to product lines, with a "maximal" rule instead of a domain model, and a notion of subrules instead of subgraphs. A subrule is a rule that can be embedded into a larger rule injectively s.t. the actions of rule elements are preserved [12], e.g., deletions are mapped to deletions. The set of all subrules of a rule <sup>r</sup> is written <sup>P</sup>(r).

**Definition 4 (Variability-based (VB) rule).** *A VB rule* rˇ = (F*r*<sup>ˇ</sup>, Φ*r*<sup>ˇ</sup>, r*r*<sup>ˇ</sup>, f*r*<sup>ˇ</sup>) *consists of three parts: a* feature model *that consists of a set* F*r*<sup>ˇ</sup> *of features, and a set of* feature constraints <sup>Φ</sup>*r*<sup>ˇ</sup> <sup>⊆</sup> <sup>T</sup>*BOOL*(F*r*<sup>ˇ</sup>)*, a* maximal rule <sup>r</sup>*r*<sup>ˇ</sup> *being a rule, and <sup>a</sup>* set of presence conditions *expressed as a function* <sup>f</sup>*<sup>P</sup> :* <sup>P</sup>(r*r*<sup>ˇ</sup>) <sup>→</sup> <sup>T</sup>BOOL(F*r*<sup>ˇ</sup>)*.*

To later consider the *base rule*, that is, a maximal subrule of multiple flat rules, we define the flattening of VB rules in terms of consecutive intersection and union constructions, expressed as multi-pullbacks and -pushouts [12]. The multi-pullback r<sup>0</sup> gives the base rule, over which the flat rule arises by multi-pushout.

**Definition 5 (Flat rule).** *Given a VB rule* rˇ*, for a valid configuration* c *w.r.t.* Φ*r*<sup>ˇ</sup>*, there exists a unique set of* n *subrules* <sup>S</sup>*<sup>c</sup>* ⊆ P(r*r*<sup>ˇ</sup>) *s.t.* <sup>∀</sup><sup>s</sup> ∈ P(r*r*<sup>ˇ</sup>) : <sup>s</sup> <sup>∈</sup> <sup>S</sup>*<sup>c</sup> iff* <sup>c</sup> *satisfies* f*r*<sup>ˇ</sup>(s)*. Merging these subrules via multi-pullback and multi-pushout over* r*r*<sup>ˇ</sup> *and* r0*, respectively, yields a rule* r*c, called* flat rule induced by c*. The flattening of* rˇ *is the set* Flat*(*rˇ*) of all flat rules of* rˇ*:* Flat*(*rˇ*) =* {r*c*|r*<sup>c</sup> is a flat rule of* <sup>r</sup>ˇ}*.*

In the example, r*r*<sup>ˇ</sup> is the rule A+B, ignoring presence conditions. Given the configuration <sup>c</sup> <sup>=</sup> {*foldEntry = true, foldExit = false*}, the multi-pullback over each subrule whose presence condition satisfies c yields as the base rule r<sup>0</sup> precisely the part of rule A + B without presence conditions (i.e., only the states). The resulting flat rule r*<sup>c</sup>* is isomorphic to rule A.

As a prerequisite for achieving efficiency during staged application, we revisit VB rule application. The key idea is that matches of a flat rule are composed from matches of all of its subrules. By considering the subrules during matching, we can reuse matches over several rules and identify early-exit opportunities.

#### **Definition 6 (VB match family, VB match, VB rule application)**


*– Given a variability-based match* mˇ = (m*c*, c) *for* rˇ *and* G*, the* application of <sup>r</sup><sup>ˇ</sup> *at* <sup>m</sup><sup>ˇ</sup> *is the rule application* <sup>G</sup> <sup>⇒</sup>*<sup>r</sup>c,m<sup>c</sup>* <sup>H</sup> *of the flat rule* <sup>r</sup>*<sup>c</sup> to* <sup>m</sup>*c.*

In the example, a VB match family is obtained: Step 1 collects matches of the LHS L0. Step 2 reuses these matches to match the flat rules: according to the compatibility condition, we may extend the matches rather than start from scratch. The set of VB rule applications for a rule ˇr to a model G is equivalent to the set of rule applications of all flat rules in Flat(ˇr) to G [12, Theorem 2].

#### **4 Multi-variability of Product Line Transformations**

A variability-based rule represents a set of similar transformation rules, while a product line represents a set of similar models. We consider the application of a variability-based rule to a product line from a formal perspective. Our idea is to combine two principles of *maximality*, which, up to now, were considered in isolation: First, by applying a rule to a "maximum" of all products, the rule can be lifted efficiently to a product line (Definition 3). Second, by reusing matches of a maximal subrule, several rules can be applied efficiently to a single model (Definition 6).

We study three strategies for applying a variability-based rule ˇr to a product line P; the third one leads to the notion of *staged rule application* as introduced in Sect. 2. First, we consider the naive case of flattening ˇr and P and applying each rule to each product. Second, we take the two maximality principles into account to avoid the flattening of ˇr. Third, we use additional aspects from the first principle to avoid the flattening of P as well. We show that all strategies are equivalent in the sense that they change all of P's products in the same way.

#### **4.1 Fully Flattened Application**

**Definition 7 (Fully flattened application).** *Given the flattening of a product line* P *and the flattening of a rule family* rˇ*, the set of fully-flattened rule applications* T rans*F F* (P, rˇ) *arises from applying each rule to each product:*

$$Trans\_{FF}(P, \vec{r}) = \{P\_i \Rightarrow\_{r\_c, m\_c} Q\_i | P\_i \in Float(P), r\_c \in Float(\vec{r}), match \ m\_c: L\_c \to P\_i\}$$

In the example, there are two rules and six products; however, only for two products—the ones arising from configurations with *Delay = true* and *Heat = false*—a match, and, therefore, a rule application exists, as we saw in the earlier description of the example. T rans*F F* (P, rˇ) comprises the resulting two rule applications.

#### **4.2 Partially Flattened Application**

We now consider a strategy that aims to avoid unflattening the variability-based rule ˇr. We use the fact that the rules in ˇr generally share a maximal, possibly empty sub-rule r<sup>0</sup> that can be embedded into all rules in ˇr. Moreover, we exploit the fact that each product has an inclusion into the domain model.

The key idea is as follows: each match of a flat rule to a product includes a match of r<sup>0</sup> into the domain model M*<sup>P</sup>* . Absence of such a match implies that none of the rules in ˇr has a match, allowing us to stop without considering any flat rule in its entirety. Such exit point is particularly beneficial if the VB rule represents a subset of a larger rule set in which only a few rules can be matched at one time. Conversely, if a match for r<sup>0</sup> exists, a rule application arises if the match can be "rerouted" onto one of the products P*i*. In this case, we consider the flat rules, saving redundant matching effort by reusing the matches of r0.

**Fig. 5.** Partially flattened rule application.

To reuse matches to the domain model for the products, we introduce the rerouting of a morphism from its codomain onto another graph G . We omit naming the codomain and G explicitly where they are clear from the context.

**Definition 8 (Rerouted morphism).** *Let an inclusion* <sup>i</sup> : <sup>G</sup> <sup>→</sup> <sup>G</sup>*, a morphism* <sup>m</sup> : <sup>L</sup> <sup>→</sup> <sup>G</sup> *with an epi-mono-factorization* (e, m )*, and a morphism* j : <sup>m</sup>[L] <sup>→</sup> <sup>G</sup> *be given, s.t.* <sup>m</sup> <sup>=</sup> <sup>i</sup> ◦ <sup>j</sup>*. The* rerouted morphism *reroute*(m, G ) : <sup>L</sup> <sup>→</sup> <sup>G</sup> *arises by composition: reroute*(m, G ) = <sup>j</sup> ◦ <sup>e</sup>*.*

**Definition 9 (Rerouted variability-based match).** *Given a graph* G*, a variability-based rule* rˇ *with a variability-based match* mˇ = (m*c*, c) *(Definition 6), and an inclusion* <sup>i</sup> : <sup>G</sup> <sup>→</sup> <sup>G</sup>*. If the epi-mono-factorization of* <sup>m</sup>*<sup>c</sup> and a suitable morphism* j *exists, a rerouted morphism onto* G *arises (Definition 8). Pairing this morphism with the configuration* c *induces the* rerouted variability-based match *of* mˇ *<sup>c</sup> onto* G *: reroute*( ˇm, G )=(*reroute*(m*c*, G ), c)*.*

In Fig. 5, m*c,h* is the morphism obtained by rerouting a match m*c,t* from the domain model M*<sup>p</sup>* to product P*h*. For example, if m*c,t* is the match indicated in steps 1 and 2 of Fig. 4, the morphism j and, consequently, m*c,h* exists only for products in which all images of the mappings exist as well, e.g., the product shown in the right of Fig. 2. Note that m*c,t* is a variability-based match to M*<sup>P</sup>* : In an earlier explanation, we saw that the family (m*i,t*) forms a variability-based match family. Therefore, per Definition 9, pairing m*c,h* with the configuration c induces a variability-based match to P*h*, which can be used as follows.

Variability-based rule application (Definition 6) allows us to save matching effort by considering shared parts of rules to a graph only once. The following definition allows us to lift this insight from graphs onto product lines. We show that the sets of partially and fully flattened rule applications are equivalent.

**Definition 10 (Partially flattened application).** *Given a variability-based rule* rˇ *and a product line* P*, the set of* partially flattened rule applications T rans*P F* (P, rˇ) *is obtained by rerouting all variability-based matches from the domain model* M*<sup>P</sup> to products in* P *and collecting all resulting rule applications:*

$$\begin{aligned} Trans\_{PF}(P, \check{r}) &= \{ P\_i \Rightarrow\_{\check{r}, \check{m}'} Q\_i \mid \check{m} = (m\_c, c) \text{ is a } VB \text{ match } of \check{r} \text{ to } M\_P, \\ P\_i &\in Flat(P), \check{m}' = \{ reroute(m\_c, P\_i), c \} \text{ is a } VB \text{ match} \} \end{aligned}$$

**Theorem 1 (Equivalence of fully and partially flattened rule applications).** *Given a product line* P *and a variability-based rule* rˇ*,* T rans*F F* (P, rˇ) = T rans*P F* (P, rˇ)*.*

*Proof idea.*<sup>1</sup> For every fully flattened (FF) rule application, we can find a corresponding partially flattened (PF) one, and vice versa: Given a FF rule application at a match m , we compose m with the product inclusion into the domain model M*<sup>P</sup>* to obtain a match m*<sup>c</sup>* into M*<sup>P</sup>* . Per Theorem 2 in [12], m*<sup>c</sup>* induces a VB match and rule application. From a diagram chase, we see that m is the morphism arising from rerouting m*<sup>c</sup>* onto the product P*i*. Consequently, the rule application is PF. Conversely, a PF variability-based rule application induces a corresponding FF rule application by its definition.

#### **4.3 Staged Application**

The final strategy we consider, staged application, aims to avoid unflattening the products as well. This can be achieved by employing lifting (Definition 3): Lifting

<sup>1</sup> A full proof is provided in the extended version of this paper: http://danielstrueber. de/publications/SPJ18.pdf.

takes a single rule and applies it to a domain model and its presence conditions in such a way as if the rule had been applied to each product individually. The considered rule in our case is a flat rule with a match to the domain model.

Note that we cannot compare the set of staged applications directly to the set of flattened applications, since it does not live on the product level. We can, however, compare the obtained sets of products from both sets of applications, which happens to be the same, thus showing the correctness of our approach.

**Definition 11 (Staged application).** *Given a variability-based rule* rˇ *and a product line* P*, the set of* staged applications T rans*St*(P, rˇ) *is the set of lifted rule applications obtained from VB matches to the domain model* M*<sup>P</sup> :*

T rans*St*(P, <sup>r</sup>ˇ) ={<sup>P</sup> <sup>⇒</sup><sup>↑</sup> *<sup>r</sup>c,m<sup>c</sup>* <sup>Q</sup> <sup>|</sup> <sup>m</sup><sup>ˇ</sup> = (m*c*, c) *is a VB match of* <sup>r</sup><sup>ˇ</sup> *to* <sup>M</sup>*<sup>P</sup>* }

**Corollary 1 (Equivalence of staged and partially flattened rule applications).** *Given a product line* P *and a variability-based rule* rˇ*, the sets of products obtained from* T rans*St*(P, rˇ) *and* T rans*P F* (P, rˇ) *are isomorphic.*

*Proof.* Since both sets are defined over the same set of matches of flat rules, the proof follows straight from the definition of lifting.

## **5 Algorithm**

We present an algorithm for implementing the staged application of a VB rule ˇr to a product line P. Following the overview in Sect. 2 and the treatment in Sect. 4, the main idea is to proceed in three steps: First, we match the base rule of ˇr to the domain model, ignoring presence conditions. Second, we consider individual rules as far as necessary to obtain matches to the domain model. Third, based on the matches, we perform the actual rule application by using the lifting algorithm from [6] in a black-box manner.


Algorithm 1 shows the computation in more detail. In line 1, ˇr's base rule r<sup>0</sup> is matched to the domain model *Model<sup>P</sup>* , leading to a set of base matches. If this set is empty, we have reached the first exit criterion and can stop directly. Otherwise, given a match m, in line 2, we check if at least one product P*<sup>i</sup>* exists that m can be rerouted onto (Definition 8). To this end, in lines 3–4, we use a SAT solver to check if there is a valid configuration of P's feature model for which all


**Table 2.** Subject rule set.

**Table 3.** Subject product lines.


presence conditions of matched elements evaluate to *true*. In this case, we iterate over the valid configurations of ˇr in line 5 (we may proceed more fine-grainedly by using partial configurations; this optimization is omitted for simplicity). In line 6, a flat rule is obtained by removing all elements from the rule whose presence condition evaluates to *false*. We match this rule to the domain model in line 7; to save redundant effort, we restrict the search to matches that extend the current base match. Absence of such a match is the second stopping criterion. Otherwise, we feed the flat rule and the set of matches to lifting in line 8. Handling dangling conditions is left to lifting; in the positive case, P is transformed afterwards.

For illustration, consider the base match <sup>m</sup><sup>1</sup> <sup>=</sup> {*Looking, Waiting, Washing*} from Fig. 4. First we calculate Φ*pc*. As none of the states in the domain model has a presence condition, Φ*pc* is set to *true* and is identified as satisfiable. Two valid configurations exist, <sup>c</sup><sup>1</sup> <sup>=</sup> {foldEntry <sup>=</sup> true, foldExit <sup>=</sup> f alse} and <sup>c</sup><sup>2</sup> <sup>=</sup> {foldEntry <sup>=</sup> f alse, foldExit <sup>=</sup> true}. Considering <sup>c</sup>1, the presence condition *foldExit* evaluates to false; removing the corresponding elements yield a rule isomorphic to Rule A in Fig. 3. Match m<sup>1</sup> is now extended using this rule, leading to a match as shown in step 2 of Fig. 4. and then lifted, as discussed in the earlier explanation of the example. Step 2 is repeated for configuration c2; yet, as no suitable match in c<sup>2</sup> exists, the shown transformation is the only possible one.

This algorithm benefits from the correctness results shown in Sect. 4. Specifically, it computes staged rule applications as per Definition 11: A configuration c is determined in line 5, and values for match m*<sup>c</sup>* are collected in the set *Matches*. Via Corollary 1 and Theorem 1, the effect of the rule application to the products is the same as if each product had been considered individually.

In terms of performance, two limiting factors are the use of a graph matcher and a SAT solver; both of them perform an NP-complete task. Still, we expect practical improvements from our strategy of reusing shared portions of the involved rules and graphs, and from the availability of efficient SAT solvers that scale up to millions of variables [26]. This hypothesis is studied in Sect. 6.

## **6 Evaluation**

To evaluate our technique, we implemented it for Henshin [27,28], a graph-based model transformation language, and applied it to a transformation scenario with product lines and transformation variability. The goal of our evaluation was to study if our technique indeed produces the expected performance benefits.

**Setup.** The transformation is concerned with the detection of applied editing operations during model differencing [29]. This setting is particularly interesting for a performance evaluation: Since differencing is a routine software development task, low latency of the used tools is a prerequisite for developer effectiveness. The rule set, called UmlRecog, is tailored to the detection of UML edit operations. Each rule detects a specific edit operation, such as "move method to superclass", based on a pair of model versions and a low-level difference trace. UmlRecog comprises 1404 rules, which, as shown in Table 2, fall in three main categories: *Create/Set*, *Change/Move*, and *Delete/Unset*. To study the effect of our technique on performance, an encoding of the rules into VB rules was required. We obtained this encoding using RuleMerger [18], a tool for generating VB rules from classic ones based on clustering and clone detection [30]. We obtained 504 VB rules; each of them representing between 1 and 71 classic rules. UmlRecog is publicly available as part of a benchmark transformation set [31].

We applied this transformation to the 6 UML-based product lines specified in Table 3. The product lines came from diverse sources and include manually designed ones (1–2), and reverse-engineered ones from open-source projects (3– 6). Each product line was available as an UML model annotated with presence conditions over a feature model. To produce the model version pairs used by UmlRecog, we automatically simulated development steps by nondeterministically applying rules from a set of edit rules to the product lines, using the lifting algorithm to account for presence conditions during the simulated editing step.


**Table 4.** Execution times (in seconds) of the lifting and the staged approach.

As baseline for comparison, we considered the lifted application of each rule in UmlRecog. An alternative baseline of applying VB rules to the flattened set of products was not considered: The SPL variability in our setting is much greater than the rule variability, which implies a high performance penalty when enumerating products. Since we currently do not support advanced transformation features, e.g., negative application conditions and amalgamation, we used variants of the flat and the VB rules without these concepts. We used a Ubuntu 17.04 system (Oracle JDK 1.8, Intel Core i5-6200U, 8 GB RAM) for all experiments.

**Results.** Table 4 gives an overview of the results of our experiments. The total execution times for our technique were between 1.5 and 3.3 s, compared to 9.4 and 10.6 s for lifting, yielding a speedup by factors between 2.8 and 6.5. For both techniques, all execution times are in the same order of magnitude across product lines. A possible explanation is that the amount of applicable rules was small: if the vast majority of rules can be discarded early in the matching process, the execution time is constant with the number of rules.

The greatest speedups were observed for the *Change/Move* category, in which rule variability was the greatest as well, indicated by the ratio between rules and VB rules in Table 2. This observation is in line with our rationale of reusing shared matches between rules. Regarding the number of products, a trend regarding better scalability is not apparent, thus demonstrating that lifting is sufficient for controlling product-line variability. Still, based on the overall results, the hypothesis that our technique improves performance in situations with significant product-line and transformation variability can be confirmed.

**Threats to Validity.** Regarding external validity, we only considered a limited set of scenarios, based on six product lines and one large-scale transformation. We aim to apply our technique to a broader class of cases in the future. The version pairs were obtained in a synthetic process, arguably one that produces pessimistic cases. Our treatment so far is also limited to a particular transformation paradigm, AGT, and one variability paradigm, the annotative one. Still, AGT and annotative variability are the underlying paradigms of many stateof-the-art tools. Finally, we did not consider the advanced AGT concepts of negative application conditions and amalgamation in our evaluation; extending our technique accordingly is left as future work.

## **7 Related Work**

During an SPL's lifecycle, not only the domain model, but also the feature model evolves [32,33]. To support the combined transformation of domain and feature models, Taentzer et al. [25] propose a unifying formal framework which generalizes Salay et al.'s notion of lifting [6], yet in a different direction than us: focusing on combined changes, this approach is not geared for internal variability of rules; similar rules are considered separately. Both works could be combined using a rule concept with separate feature models for rule and SPL variability.

Beyond transformations of SPLs, transformations have been used to *implement* SPLs. Feature-oriented development [34] supports the implementation of features as additive changes to a base product. Delta-oriented programming [35] adds flexibility to this approach: changes are specified using *deltas* that support deletions and modifications as well. Impact analysis in an evolving SPL can be performed by transforming deltas using higher-order deltas that encapsulate certain evolution operators [5]. For increased flexibility regarding inter-product reuse, deltas can be combined with traits [36]. Sijtema [8] introduced the concept of variability rules to develop SPLs using ATL. Conversely, SPL techniques have been applied to certain problems in transformation development. Xiao et al. [37] propose to capture variability in the backwards propagation of bidirectional transformations by turning the left-hand-side model into a SPL. Hussein et al. [10] present a notion of rule templates for generating groups of similar rules based on a data provenance model. These works address only one dimension of variability, either of a SPL or a transformation system.

In the domain of graph transformation reuse, rule refinement [9] and amalgamation [38] focus on reuse at the rule level; graph variability is not in their scope. Rensink and Ghamarian propose a solution for rule and graph decomposition based a certain accommodation condition, under which the effect of the original rule application is preserved [39,40]. In our approach, by matching against the full domain model rather than decomposing it, we trade off compositionality for the benefit of imposing fewer restrictions on graphs and rules.

## **8 Conclusion and Future Work**

We propose a methodology for software product line transformations in which not only the input product line, but also the transformation system contains variability. At the heart of our methodology a staged rule application technique exploits reuse potential with regard to shared portions of the involved products and rules. We showed the correctness of our technique and demonstrated its benefit by applying it to a practical software engineering task.

In the future, we aim to explore further variability dimensions, e.g., metamodel variability as considered in [41], and to extend our work to advanced transformation features, such as application conditions. We aim to address additional variability mechanisms and to perform a broader evaluation.

**Acknowledgement.** We thank Rick Salay and the anonymous reviewers for their constructive feedback. This work was supported by the Deutsche Forschungsgemeinschaft (DFG), project *SecVolution@Run-time*, no. 221328183.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Bensalem, Saddek 94 Brandt, Jacco 56 Búr, Márton 111 Chechik, Marsha 169, 319 Chen, Bo 281 Cong, Kai 281 Diamantopoulos, Themistoklis 189 Dimovski, Aleksandar S. 301 Disenfeld, Cynthia 319 Diskin, Zinovy 21 Ghezzi, Carlo 169 Gioulekas, Fotios 94 Gupta, Indranil 77 Havlicek, Christopher 281 Huistra, David 56 Jürjens, Jan 337 Kannavara, Raghudeep 281 Katirtzis, Nikolaos 189 Katsaros, Panagiotis 94 Kehrer, Timo 3 Kelter, Udo 3 König, Harald 21 Koroglu, Yavuz 264 Kosmatov, Nikolai 207 Kroening, Daniel 246 Kulcsár, Géza 38 Kumar, Rajesh 56 Landsberg, David 246 Lawford, Mark 21 Le Gall, Pascale 207 Leblebici, Erhan 38 Léchenet, Jean-Christophe 207 Liu, Si 77 Lochau, Malte 38 Marmsoler, Diego 149 Menghi, Claudio 169 Meseguer, José 77

Ölveczky, Peter Csaba 77 Palomo, Pedro 94 Park, Joonyoung 129 Peldszus, Sven 38 Peldzsus, Sven 337 Pietsch, Christopher 3 Poplavko, Peter 94 Rensink, Arend 56 Rubin, Julia 319 Ruijters, Enno 56 Ruland, Sebastian 38 Ryu, Sukyoung 129 Santhanam, Keshav 77 Schivo, Stefano 56 Semeráth, Oszkár 227 Sen, Alper 264 Spoletini, Paola 169 Stavropoulou, Ioanna 319 Stoelinga, Mariëlle 56 Strüber, Daniel 337 Sun, Kwangwon 129 Sun, Youcheng 246 Sutton, Charles 189 Szilágyi, Gábor 111 Taentzer, Gabriele 3 Varró, Dániel 111, 227 Vörös, András 111 Wang, Qi 77 Xie, Fei 281

Yang, Zhenkun 281 Yildiz, Buǧra Mehmet 56